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Preface / Acknowledgment 


The present expanded set of notes initially grew out of an attempt to 
flesh out the International Baccalaureate (IB) mathematics “Further 
Mathematics” curriculum, all in preparation for my teaching this dur- 
ing during the AY 2007-2008 school year. Such a course is offered only 
under special circumstances and is typically reserved for those rare stu- 
dents who have finished their second year of IB mathematics HL in 
their junior year and need a “capstone” mathematics course in their 
senior year. During the above school year I had two such IB math- 
ematics students. However, feeling that a few more students would 
make for a more robust learning environment, I recruited several of my 
2006-2007 AP Calculus (BC) students to partake of this rare offering 
resulting. The result was one of the most singular experiences I’ve had 
in my nearly 40-year teaching career: the brain power represented in 
this class of 11 blue-chip students surely rivaled that of any assemblage 
of high-school students anywhere and at any time! 

After having already finished the first draft of these notes I became 
aware that there was already a book in print which gave adequate 
coverage of the IB syllabus, namely the Haese and Harris text! which 
covered the four IB Mathematics HL “option topics,” together with a 
chapter on the retired option topic on Euclidean geometry. This is a 
very worthy text and had I initially known of its existence, I probably 
wouldn’t have undertaken the writing of the present notes. However, as 
time passed, and I became more aware of the many differences between 
mine and the HH text’s views on high-school mathematics, I decided 
that there might be some value in trying to codify my own personal 
experiences into an advanced mathematics textbook accessible by and 
interesting to a relatively advanced high-school student, without being 
constrained by the idiosyncracies of the formal IB Further Mathematics 
curriculum. This allowed me to freely draw from my experiences first as 
a research mathematician and then as an AP/IB teacher to weave some 
of my all-time favorite mathematical threads into the general narrative, 
thereby giving me (and, I hope, the students) better emotional and 


‘Peter Blythe, Peter Joseph, Paul Urban, David Martin, Robert Haese, and Michael Haese, 
MATHEMATICS FOR THE INTERNATIONAL STUDENT; MATHEMATICS HL (OPTIONS), Haese and 
Harris Publications, 2005, Adelaide, ISBN 1 876543 33 7 
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intellectual rapport with the contents. I can only hope that the readers 
(if any) can find some something of value by the reading of my stream- 
of-consciousness narrative. 


The basic layout of my notes originally was constrained to the five 
option themes of IB: geometry, discrete mathematics, abstract alge- 
bra, series and ordinary differential equations, and inferential statistics. 
However, I have since added a short chapter on inequalities and con- 
strained extrema as they amplify and extend themes typically visited 
in a standard course in Algebra II. As for the IB option themes, my 
organization differs substantially from that of the HH text. Theirs is 
one in which the chapters are independent of each other, having very 
little articulation among the chapters. This makes their text especially 
suitable for the teaching of any given option topic within the context 
of IB mathematics HL. Mine, on the other hand, tries to bring out 
the strong interdependencies among the chapters. For example, the 
HH text places the chapter on abstract algebra (Sets, Relations, and 
Groups) before discrete mathematics (Number Theory and Graph The- 
ory), whereas I feel that the correct sequence is the other way around. 
Much of the motivation for abstract algebra can be found in a variety 
of topics from both number theory and graph theory. As a result, the 
reader will find that my Abstract Algebra chapter draws heavily from 
both of these topics for important examples and motivation. 


As another important example, HH places Statistics well before Se- 
ries and Differential Equations. This can be done, of course (they did 
it!), but there’s something missing in inferential statistics (even at the 
elementary level) if there isn’t a healthy reliance on analysis. In my or- 
ganization, this chapter (the longest one!) is the very last chapter and 
immediately follows the chapter on Series and Differential Equations. 
This made more natural, for example, an insertion of a theoretical 
subsection wherein the density of two independent continuous random 
variables is derived as the convolution of the individual densities. A 
second, and perhaps more relevant example involves a short treatment 
on the “random harmonic series,” which dovetails very well with the 
already-understood discussions on convergence of infinite series. The 
cute fact, of course, is that the random harmonic series converges with 
probability 1. 
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I would like to acknowledge the software used in the preparation of 
these notes. First of all, the typesetting itself made use of the indus- 
try standard, ATpX, written by Donald Knuth. Next, I made use of 
three different graphics resources: Geometer’s Sketchpad, Autograph, 
and the statistical workhorse Minitab. Not surprisingly, in the chapter 
on Advanced Euclidean Geometry, the vast majority of the graphics 
was generated through Geometer’s Sketchpad. I like Autograph as a 
general-purpose graphics software and have made rather liberal use of 
this throughout these notes, especially in the chapters on series and 
differential equations and inferential statistics. Minitab was used pri- 
marily in the chapter on Inferential Statistics, and the graphical outputs 
greatly enhanced the exposition. Finally, all of the graphics were con- 
verted to PDF format via ADOBE® ACROBAT® 8 PROFESSIONAL 
(version 8.0.0). I owe a great debt to those involved in the production 
of the above-mentioned products. 


Assuming that I have already posted these notes to the internet, I 
would appreciate comments, corrections, and suggestions for improve- 
ments from interested colleagues and students alike. The present ver- 
sion still contains many rough edges, and I’m soliciting help from the 
wider community to help identify improvements. 


Naturally, my greatest debt of 
gratitude is to the eleven students 
(shown to the right) I conscripted 
for the class. They are (back row): 
Eric Zhang (Harvey Mudd), Jong- 
Bin Lim (University of Illinois), 
Tiimothy Sun (Columbia Univer- 
sity), David Xu (Brown Univer- ° 
sity), Kevin Yeh (UC Berkeley), 
Jeremy Liu (University of Vir- 
ginia); (front row): Jong-Min Choi (Stanford University), T.J. Young 
(Duke University), Nicole Wong (UC Berkeley), Emily Yeh (University 
of Chicago), and Jong Fang (Washington University). Besides provid- 
ing one of the most stimulating teaching environments I’ve enjoyed over 
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my 40-year career, these students pointed out countless errors in this 
document’s original draft. To them I owe an un-repayable debt. 

My list of acknowledgements would be woefully incomplete without 
special mention of my life-long friend and colleague, Professor Robert 
Burckel, who over the decades has exerted tremendous influence on how 
I view mathematics. 


David Surowski 
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May 25, 2008 
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Chapter 1 


Advanced Euclidean Geometry 


1.1 Role of Euclidean Geometry in High-School 
Mathematics 


If only because in one’s “further” studies of mathematics, the results 
(i.e., theorems) of Euclidean geometry appear only infrequently, this 
subject has come under frequent scrutiny, especially over the past 50 
years, and at various stages its very inclusion in a high-school mathe- 
matics curriculum has even been challenged. However, as long as we 
continue to regard as important the development of logical, deductive 
reasoning in high-school students, then Euclidean geometry provides as 
effective a vehicle as any in bringing forth this worthy objective. 

The lofty position ascribed to deductive reasoning goes back to at 
least the Greeks, with Aristotle having laid down the basic foundations 
of such reasoning back in the 4th century B.C. At about this time Greek 
geometry started to flourish, and reached its zenith with the 13 books 
of Euclid. From this point forward, geometry (and arithmetic) was an 
obligatory component of one’s education and served as a paradigm for 
deductive reasoning. 

A well-known (but not well enough known!) anecdote describes for- 
mer U.S. president Abraham Lincoln who, as a member of Congress, 
had nearly mastered the first six books of Euclid. By his own admis- 
sion this was not a statement of any particular passion for geometry, 
but that such mastery gave him a decided edge over his counterparts 
is dialects and logical discourse. 

Lincoln was not the only U.S. president to have given serious thought 
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to Euclidean geometry. President James Garfield published a novel 
proof in 1876 of the Pythagorean theorem (see Exercise 3 on page 4). 


As for the subject itself, it is my personal feeling that the logical 
arguments which connect the various theorems of geometry are every 
bit as fascinating as the theorems themselves! 


So let’s get on with it ... ! 


1.2 Triangle Geometry 


1.2.1 Basic notations 


We shall gather together a few notational conventions and be reminded 
of a few simple results. Some of the notation is as follows: 


A, B,C labels of points 

[AB] The line segment joining A and B 

AB The length of the segment [AB] 

(AB) The line containing A and B 

A The angle at A 

CAB The angle between [CA] and [AB] 
AABC The triangle with vertices A, B, and C’ 


AABC = AA'B'C' The triangles AABC and AA’B’C" are congruent 
AABC ~ AA'B'C’ The triangles AABC and AA‘B’C’ are similar 
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1.2.2 The Pythagorean theorem 


One of the most fundamen- 
tal results is the well-known 
Pythagorean Theorem. This 
states that a? + b? = c? in a right 
triangle with sides a and b and 
hypotenuse c. The figure to the 
right indicates one of the many 
known proofs of this fundamental 
result. Indeed, the area of the 
“big” square is (a + b)? and can be 
decomposed into the area of the 
smaller square plus the areas of the 
four congruent triangles. That is, 


(a+b)? = c? + 2ab, 


which immediately reduces to a? + b? = c?. 


Next, we recall the equally well- 
known result that the sum of the 
interior angles of a triangle is 180°. 
The proof is easily inferred from the 
diagram to the right. 


EXERCISES 


1. Prove Euclid’s Theorem for 


Proportional Segments, 


given the right triangle AABC as 


indicated, then 


h2 = pq, a= pC, b? = qc. 


16x, 


3 
b a 
a b 
ie! Y 
Cc 
b a 


2. Prove that the sum of the interior angles of a quadrilateral ABC'D 


is 360°. 


4 CHAPTER 1 ADVANCED EUCLIDEAN GEOMETRY 


3. In the diagram to the right, AABC F 
is a right triangle, segments [AB] 
and [AF] are perpendicular and 8 
equal in length, and [EF] is per- é 
pendicular to [CE]. Set a = a 
BC,b = AB,c = AB, and de- 
duce President Garfield’s proof! of ¢ 5 A rs 
the Pythagorean theorem by com- 
puting the area of the trapezoid 
BCEF. 


1.2.3 Similarity 


In what follows, we’ll see that many—if not most—of our results shall 
rely on the proportionality of sides in similar triangles. A convenient 
statement is as follows. 


Similarity. Given the similar tri- 
angles AABC ~ AA'BC", we have 
that 


A'B BC" A'C! 


AB BC AC’ 


Conversely, if 


A'B BC! A'C! 
AB BC AC’ 


then triangles AABC ~ AA'BC’ are similar. 


1 James Abram Garfield (1831-1881) published this proof in 1876 in the JOURNAL OF EDUCATION 
(Volume 3 Issue 161) while a member of the House of Representatives. He was assasinated in 1881 
by Charles Julius Guiteau. As an aside, notice that Garfield’s diagram also provides a simple proof 
of the fact that perpendicular lines in the planes have slopes which are negative reciprocals. 
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PrRooF. Note first that AAA’C’ 
and AC'A’C’ clearly have the same 
areas, which implies that AABC’ 
and AC'A’B have the same area 
(being the previous common area 
plus the area of the common trian- 
gle AA’BC"). Therefore A 


A' 


A'B sh: A'B 
AB sh- AB 
area A A’BC" 
area AA BC" 
area AA’ BC" 
area AC'A’B 
5h -BC' 

Shi -BC 
BC’ 

BO 


A'B _ A'C! 
AB AC’ 


In an entirely similar fashion one can prove that 


Conversely, assume that 
A'B BC' 


AB BC’ 

In the figure to the right, the point 
C” has been located so that the seg- 
ment [A’C”] is parallel to [AC]. But 
then triangles AABC' and AA’BC"” 
are similar, and so 

BOY AB BC’ 

BC AB BC’ 
i.e., that BC” = BC’. This clearly implies that C’ = C”, and so [A‘C"] 
is parallel to [AC]. From this it immediately follows that triangles 
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AABC and AA'BC’' are similar. 


EXERCISES 


1. Let AABC and AA'B'C’ be given with ABC = A'B’C’ and 


A’ BR’ B'C' 
= . Then AABC ~ AA'B'C"’. 
AB BO en C G 
A 
2. In the figure to the right, 

AD=rAB, AE=sAC. 
Show that D E 

Area AADE _ 

Area AABC 

B (é 


3. Let AABC be a given triangle and let Y, Z be the midpoints of 
[AC], [AB], respectively. Show that (XY) is parallel with (AB). 
(This simple result is sometimes called the Midpoint Theorem) 


4. In AABC, you are given that B 
AY OX? UB 41 Z 4 
VO XB 2A ge. 
where x is a positive real number. 
Assuming that the area of AABC 


is 1, compute the area of AXY Z as 
a function of x. A 


5. Let ABCD be a quadrilateral and let EFGH be the quadrilateral 
formed by connecting the midpoints of the sides of ABC'D. Prove 
that EF'GH is a parallelogram. 
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6. In the figure to the right, ABC'D is 
a parallelogram, and F is a point B Cc 


on the segment [AD]. The point 

F is the intersection of lines (BE) 

and (CD). Prove that AB x FB = , ae, 
CF x BE. C 


7. In the figure to the right, tangents 
to the circle at B and C’ meet at the B 
point A. A point P is located on D. am 
the minor arc BC’ and the tangent A 
to the circle at P meets the lines ‘ -O 
(AB) and (AC) at the points D and \ 
E, respectively. Prove that DOE = E\ < 
iBOC, where O is the center of the 
given circle. 


1.2.4 “Sensed” magnitudes; The Ceva and Menelaus theo- 
rems 


In this subsection it will be convenient to consider the magnitude AB of 
the line segment [AB] as “sensed,”* meaning that we shall regard AB 
as being either positive or negative and having absolute value equal to 
the usual magnitude of the line segment [AB]. The only requirement 
that we place on the signed magnitudes is that if the points A, B, and 
C’ are colinear, then 


— — 
>0O if AB and BC are in the same direction 


AB x BG = — = hy 
<0 if AB and BC are in opposite directions. 


?1B uses the language “sensed” rather than the more customary “signed.” 
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This implies in particular that for signed magnitudes, 


Before proceeding further, the reader should pay special attention 
to the ubiquity of “dropping altitudes” as an auxiliary construction. 


Both of the theorems of this subsec- 
tion are concerned with the following 
configuration: we are given the trian- 
gle AABC and points X, Y, and Z on 
the lines (BC), (AC), and (AB), respec- 
tively. Ceva’s Theorem is concerned with 
the concurrency of the lines (AX), (BY), 
and (CZ). Menelaus’ Theorem is con- 
cerned with the colinearity of the points 
X, Y, and Z. Therefore we may regard these theorems as being “dual” 
to each other. 


In each case, the relevant quantity to consider shall be the product 


AZ | BX. CY 
ZB XC YA 


Note that each of the factors above is nonnegative precisely when the 
points X, Y, and Z lie on the segments [BC], [AC], and [AB], respec- 
tively. 


The proof of Ceva’s theorem will be greatly facilitated by the fol- 
lowing lemma: 


SECTION 1.2 TRIANGLE GEOMETRY 9 


LEMMA. Given the triangle B 

AABC, let X be the intersection of p 
a line through A and meeting (BC). 

Let P be any other point on (AX). 

Then 


area A APB _ BX A 
area AAPG CX’ Cc 


Proor. In the diagram to the 

right, altitudes BR and C'S have RB 

been constructed. From this, we see P 
that S 


area AAPB sAP -BR 

area AAPC 5AP -CS 

BR A C 
Cs 

BX 

Ox’ 

where the last equality follows from the obvious similarity 

ABRX ~ ACSX. 


Note that the above proof doesn’t depend on where the line (AP) in- 
tersects (BC), nor does it depend on the position of P relative to the 
line (BC), i-e., it can be on either side. 


CEVA’S THEOREM. Given the triangle AABC, lines (usually called 
Cevians are drawn from the vertices A, B, and C, with X, Y, and Z, 
being the points of intersections with the lines (BC), (AC), and (AB), 
respectively. Then (AX), (BY), and (CZ) are concurrent if and only 
it 


AZ BX _ CY | 


are 
Te XO vA 
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B Z 
7 X B 


C A 
A Y CY 


PrRooF. Assume that the lines in question are concurrent, meeting in 
the point P. We then have, applying the above lemma three times, 
that 


areaAAPC areaAAPB area ABPC 


= area ABPC. areaZAAPC area ABPA 
_ AZ BX CY 
~ ZB XC YA 


To prove the converse we need to 
prove that the lines (AX), (BY), 
and (CZ) are concurrent, given 
that 

AZ BX CY _ 

ZB 2 ye 
Let GQ = (AX) nN (BY), 2 = 
(CQ) N (AB). Then (AX), (BY), 
and (C'Z') are concurrent and so 


AZ’ BX CY _ A 
WB XO VZ 7 
which forces 
AZ’ AZ 
Z’B ZB 


This clearly implies that Z = Z’, proving that the original lines (AX), (BY), 
and (CZ) are concurrent. 


Menelaus’ theorem is a dual version of Ceva’s theorem and concerns 
not lines (i.e., Cevians) but rather points on the (extended) edges of 
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the triangle. When these three points are collinear, the line formed 
is called a transversal. The reader can quickly convince herself that 
there are two configurations related to AABC: 


B x 
Z Xx Z 


y 
‘- C A a 
As with Ceva’s theorem, the relevant quantity is the product of the 
sensed ratios: 


AZ BX CY 
ZB XC YA’ 


in this case, however, we see that either one or three of the ratios must 
be negative, corresponding to the two figures given above. 


MENELAUS’ THEOREM. Given the triangle AABC and given points 
X,Y, and Z on the lines (BC), (AC), and (AB), respectively, then 
X,Y, and Z are collinear if and only if 


AZ BX CY _ 
ZB XC” YA 


I. 


PROOF. As indicated above, there are two cases to consider. The first 
case is that in which two of the points X, Y, or Z are on the triangle’s 
sides, and the second is that in which none of X, Y, or Z are on the 
triangle’s sides. The proofs of these cases are formally identical, but 
for clarity’s sake we consider them separately. 
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CASE 1. We assume first that 
X, Y, and Z are collinear and drop 
altitudes hy, he, and hg as indicated 
in the figure to the right. Using ob- 
vious similar triangles, we get 


AZ hy BX whe, CY _ 
hs’ YA ~ 


ZB ‘hy XC 


in which case we clearly obtain 


AZ BX CY _ 
GR XC” YA 


I. 


To prove the converse, we may assume that X is on [BC], Z is on 


[AB], and that Y is on (AC) with #4 -25.2° = —1. We let X’ be the 


intersection of (ZY) with [BC] and infer from the above that 


AZ BX OY 


ZB XC YA 


It follows that bx = aa from which we infer easily that X = _X', and 


so X, Y, and Z are collinear. 


CASE 2. Again, we drop altitudes from 
A, B, and C and use obvious similar tri- 
angles, to get 


AZ iy BX hy AY hy 


ZB hy XC ha’ YC he’ 


it follows immediately that 


AZ BX CY 


ZB XC YA * 


The converse is proved exactly as above. 
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1.2.5 Consequences of the Ceva and Menelaus theorems 


As one typically learns in an elementary geometry class, there are sev- 
eral notions of “center” of a triangle. We shall review them here and 
show their relationships to Ceva’s Theorem. 


Centroid. In the triangle AABC 

lines (AX), (BY), and (CZ) B 
are drawn so that (AX) bisects 
[BC], (BY) bisects [CA], and 
(CZ) bisects [AB] That the lines 
(AX), (BY), and (CZ) are con- 
current immediately follows from 
Ceva’s Theorem as one has that C 


y 
ee Aid, A 
ZB XC YZ 


The point of concurrency is called the centroid of AABC. The three 
Cevians in this case are called medians. 


Next, note that if we apply the Menelaus’ theorem to the triangle 
AAC X and the transversal defined by the points B, Y and the centroid 
P, then we have that 


1 AY CB XP 
YC BX PA 
ee ae ee 
- PA ~ PA 2 


Therefore, we see that the distance of a triangle’s vertex to the centroid 
is exactly 1/3 the length of the corresponding median. 
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B 


Orthocenter. In the trian- x 
gle AABC lines (AX), (BY), and 7 
(CZ) are drawn so that (AX) L 
(BC), (BY) L (CA), and (CZ) L 
(AB). Clearly we either have 
AZ BX CY “ 
ZB’ XC’ YA 
or that exactly one of these ratios 
is positive. We have 


0 


AZ CZ 
Likewise, we have 
BX AX 
CY _ BY 
CX AX’ 


Therefore, 
AZ BX CY AZ BX CY CZ AX BY _ 
ZB XC YA AY BZ CX BY CZ AX — 
By Ceva’s theorem the lines (AX), (BY), and (CZ) are concurrent, and 


the point of concurrency is called the orthocenter of AABC. (The 
line segments [AX], [BY], and [CZ] are the altitudes of AABC.) 


i 


Incenter. In the triangle AABC lines 

(AX), (BY), and (CZ) are drawn so B 

that (AX) bisects BAC, (BY) bisects 

ABC, and (CZ) bisects BCA As we x 
show below, that the lines (AX), (BY), 7 

and (CZ) are concurrent; the point of 

concurrency is called the incenter of 

AABC. (A very interesting “extremal” 


A 
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property of the incenter will be given in 
Exercise 12 on page 153.) However, we shall proceed below to give 
another proof of this fact, based on Ceva’s Theorem. 


Proof that the angle bisectors of AABC' are concurrent. In 
order to accomplish this, we shall first prove the 


ANGLE BISECTOR THEOREM. We B 
are given the triangle AABC with 

line segment |BP] (as indicated to 

the right). Then 


AB AP a a 
BG = pa 7 ABP = PBC. C 
PROOF (<=). We drop altitudes 
from P to (AB) and (BC); call the B 
points so determined Z and Y, re- 
spectively. Drop an altitude from 
B to (AC) and call the resulting 
point X. Clearly PZ = PY as 


APZB = APYB. Next, we have Z 
AB BX BX 
AABX ~ AAPZ - ee 
"AP PZ PY A x P 
Likewise, 
CB BX 
ACBX ~ ACPY => —— = —. 
- CRY pe py 
Therefore, 


AB AP-BX PY _ AP 
BC PY CP-BX CP 
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poe = Ze Tet 


(=). Here we’re given that = B 

P’ be the point determined by the angle 

bisector (BP’) of ABC. Then by what 

has already been proved above, we have 

= An .. But this implies that 
AP AP’ C 
—_—_ = => P = P’. ' 
PO PRC p P 


A 


Conclusion of the proof that angle bisectors are concurrent. 
First of all, it is clear that the relevant ratios are all positive. By the 
Angle Bisector Theorem, 


AB AY BC BZ AB_ BX 
BO VC COA GA AG “XG? 


therefore, 


AZ BX OY _ CA AB BC _ 
BZ “XO YA BC AC AB 


Ceva’s theorem now finishes the job! 


EXERCISES 


1. The Angle Bisector Theorem involved the bisection of one of the 
given triangle’s interior angles. Now let P be a point on the line 
(AC) external to the segment [AC]. Show that the line (BP) 
bisects the external angle at B if and only if 


AB AP 

BO - PO 
2. You are given the triangle AABC’. Let X be the point of inter- 
section of the bisector of BAC with [BC] and let Y be the point 
of intersection of the bisector of CBA with [AC]. Finally, let Z be 


the point of intersection of the exterior angle bisector at C’ with 
the line (AB). Show that X, Y, and Z are colinear.® 


3What happens if the exterior angle bisector at C is parallel with (AB)? 
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3. 


. Given AABC and points X, Y,and 7 


Given AABC and assume that X is on (BC), Y is on (AC) and 
Z is on (AB). Assume that the Cevians (AX) (BY), and (CZ) 
are concurrent, meeting at the point P. Show that 

PX PY. PZ 


AX BY GZ 


. Given the triangle AABC with incenter P, prove that there exists 


a circle C (called the incircle of AABC) with center P which is 
inscribed in the triangle AABC. The radius r of the incircle is 
often called the inradius of AABC. 


. Let AABC have side lengths a = BC, b = AC, and c = AB, 


and let r be the inradius. Show that the area of AABC is equal 
to mathe) | (Hint: the incenter partitions the triangle into three 
smaller triangles; compute the areas of each of these.) 


. Given the triangle AABC. Show that the bisector of the internal 


angle bisector at A and the bisectors of the external angles at B 
and C’ are concurrent. 


Z in the plane such that 


LABT = LOB. 
PBOM 2 LACY. 
ZBAZ = ZCAY. A 


Show that (AX), (BY), and (CZ) 
are concurrent. 


. There is another notion of “center” of the triangle AABC’. Namely, 


construct the lines /;, l2, and lz so as to be perpendicular bisectors 
of [AB], [BC], and [CA], respectively. After noting that Ceva’s 
theorem doesn’t apply to this situation, prove directly that the 
lines 1, 2, and /3 are concurrent. The point of concurrency is 
called the circumcenter of AABC. (Hint: argue that the point 
of concurrency of two of the perpendicular bisectors is equidistant 
to all three of the vertices.) If P is the circumcenter, then the 
common value AP = BP = CP is called the circumradius 
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10), 


11. 


12. 


13. 
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of the triangle AABC. (This is because the circumscribed circle 
containing A, B, and C' will have radius AP.) 


. AABC has side lengths AB = 21, AC = 22, and BC’ = 20. 


Points D and E are on sides [AB] and [AC], respectively such 
that [DE] || [BC] and [DE] passes through the incenter of AABC. 
Compute DE. 


Here’s another proof of Ceva’s the- 
orem. You are given AABC' and 
concurrent Cevians [AX], [BY], 

and [C’Z], meeting at the point P. N. 
Construct the line segments [AN] 

and [CM], both parallel to the Ce- 

vian [BY]. Use similar triangles to 
conclude that 


AY AN CX CM BZ_ BP A 
YC CM’ XB BP’ZA_ AN’ 

AZ BX CY _ 
ZB XC YA 
Through the vertices of the triangle APQR lines are drawn lines 
which are parallel to the opposite sides of the triangle. Call the 


new triangle AABC. Prove that these two triangles have the same 
centroid. 


and hence that 1. 


Given the triangle AABC, let C be the inscribed circle, as in 
Exercise 4, above. Let X,Y, and Z be the points of tangency 
of C (on the sides [BC], [AC], [AB], respectively) and show that 
the lines (AX), (BY), and (CZ) are concurrent. The point of 
concurrency is called the Gergonne point of the circle C. (This 
is very easy once you note that AZ = YZ, etc.!) 


In the figure to the right, the dotted 
segments represent angle bisectors. Q 
Show that the points P, R, and Q R 

are colinear. P 
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14. In the figure to the right, three cir- A 


15. 


16. 


17, 


cies of the same radius and centers 
X,Y and Z are shown intersecting 
at points A, B, C, and D, with D 
the common point of intersection of 
all three circles. 


Show that GPX 7 


(a) D is the circumcenter of B 
AXY Z, and that 


(b) D is the orthocenter of AABC. 
(Hint: note that YZCD is 


a rhombus.) 


Show that the three medians of a triangle divide the triangle into 
six triangle of equal area. 


Let the triangle AABC be given, and let A’ be the midpoint of 
[BC], B’ the midpoint of [AC] and let C” be the midpoint of [AB]. 
Prove that 


(i) AA'B’C' ~ AABC and that the ratios of the corresponding 
sides are 1:2. 


(ii) AA’B’C’ and AABC have the same centroid. 


(iii) The four triangles determined within AABC' by AA'B’C' 
are all congruent. 


(iv) The circumcenter of AABC is the orthocenter of AA’B’C’. 


The triangle AA’B’C’ of A ABC formed above is called the me- 
dial triangle of AABC. 


The figure below depicts a hexagram “inscribed” in two lines. Us- 
ing the prompts given, show that the lines X, Y, and Z are colin- 
ear. This result is usually referred to Pappus’ theorem. 
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C 


D E F 


Step 1. Locate the point G on the lines (AF) and (FB); we shall 
analyze the triangle AGH as indicated below.* 


Step 2. Look at the transversals, applying Menelaus’ theorem to 
each: 


“Of course, it may be that (AE) and (F'B) are parallel. In fact, it may happen that all analogous 
choices for pairs of lines are parallel, which would render the present theme invalid. However, while 
the present approach uses Menelaus’ theorem, which is based on “metrical” ideas, Pappus’ theorem 
is a theorem only about incidence and colinearity, making it really a theorem belonging to “projective 
geometry.” As such, if the lines (AE) and (BF) were parallel, then projectively they would meet 
“at infinity;” one could then apply a projective transformation to move this point at infinity to the 
finite plane, preserving the colinearity of X, Y, and Z 
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13, 


10; 


GX ID HB 
IDXBI, 80 STOR BG 
GAIY HF 
[AY F], so AIYH FG 1. 
ICZE]  (etc.) 
[ABC]  (etc.) 
[DEF] (etc) 
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Step 3. Multiply the above five factorizations of —1, cancelling 


out all like terms! 


This time, let the hexagram be in- 
scribed in a circle, as indicated to 
the right. By producing edges [AC] 
and [FD] to a common point R 
and considering the triangle APQR 
prove Pascal’s theorem, namely 
that points X, Y, and Z are co- 
linear. (Proceed as in the proof 
of Pappus’ theorem: consider the 
transversals [BX F'], [AY D], and 
(CZE|, multiplying together the 
factorizations of —1 which each pro- 
duces.) 


A straight line meets the sides [PQ], [QR], [RS], and [SP] of the 
quadrilateral PQRS at the points U, V, W, and_X, respectively. 


Use Menelaus’ theorem to show that 


PU QV RW Sx 


UQ”~VR™WS”~ XP 


22 


20. The diagram to the right shows 
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three circles of different radii with 
centers A, B, and C. The points 
X,Y, and Z are defined by inter- 
sections of the tangents to the cir- 
cles as indicated. Prove that X, Y, 
and Z are colinear. 


(The Euler line.) In this exercise you will be guided through the 
proof that in the triangle AABC, the centroid, circumcenter, and 
orthocenter are all colinear. The line so determined is called the 
Euler line. 


In the figure to the right, let G be the centroid of AABC, and 


— 
let O be the circumcenter. Locate P on the ray OG so that GP : 
OG =2e 1 


(a) Let A’ be the intersection of 
(AG) with (BC); show that ; 
AOGA' ~ APGA. (Hint: re- Euler/line 
call from page 13 that GA : 
GA’ = 2:1.) 

Conclude that (AP) and (OA’) 
are parallel which puts P on the 
altitude through vertex A. 


aia 
on 
Senet 


i 
QO 
na 


Similarly, show that P is also 
on the altitudes through ver- 
tices B and C, and so P is the 
orthocenter of AABC. 
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1.2.6 Brief interlude: laws of sines and cosines 


In a right triangle AABC, where C B 
is aright angle, we have the familiar 

trigonometric ratios: setting 6 = C 

A, we have Gq 


aoe 9. °. 
sin = cos =—— A b C 


the remaining trigonometric ratios (tan, csc@, sec@, cot@) are all 
expressable in terms of sin@ and cos@ in the familiar way. Of crucial 
importance here is the fact that by similar triangles, these 
ratios depend only on @ an not on the particular choices of 
side lengths.” 


We can extend the definitions of 

the trigonometric functions to ar- (x y) 
bitrary angles using coordinates in . 

the plane. Thus, if @ is any given 

angle relative to the positive x-axis 

(whose measure can be anywhere rs) 
between —oco and oo degrees, and if 

(x,y) is any point on the terminal 

ray, then we set 


sin? = 


Notice that on the basis of the above definition, it is obvious that 
sin(180 —6@) = sin @ and that cos(180—6) = —cos@. Equally important 
(and obvious!) is the Pythagorean identity: sin? @ + cos? 6 = 1. 


5A fancier way of expressing this is to say that by similar triangles, the trigonometric functions 
are well defined. 
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B 


LAW OF SINES. Given triangle 
AABC and sides a, b, and c, as in- 


dicated, we have C a 
sin A  snB © sin C’ 
a ob ¢° 


PROOF. We note that 


1 1 
50 sin A = area AABC = 300 sin C, B 
and so 
Cc a 
snA — sinC 
a c. 

A similar argument shows that A b C 
sin B 


is also equal to the above. 


LAW OF COSINES. Given triangle 
AABC and sides a, b, and c, as in- a 
dicated, we have C 


C=a? +b? — 2abcosC. 
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PrRooF. Referring to the dia- 
gram to the right and using the a 


Pythagorean Theorem, we _ infer 


quickly that 


Cc 


= (b—acosC)’? +a’ sin? C A b C 
= b* — 2abcosC + a? cos? C + a” sin? C 
= qa’?+b? — 2abcosC, 


as required. 


EXERCISES 


1. Using the Law of Sines, prove the Angle Bisector Theorem (see 


page 15). 


2. Prove Heron’s formula. Namely, for the triangle whose side 


lengths are a, b, and c, prove that the area is given by 


area = \/s(s — a)(s — b)(s — 0), 

b 
where s = anes = one-half the perimeter of the triangle. 
(Hint: if A is the area, then start with 16A? = 4b?(c? —c? cos? A) = 
(2bc — 2bc cos A)(2bc + 2bccos A). Now use the Law of Cosines to 
write 2bccos A in terms of a, b, and c and do a bit more algebra.) 


. In the quadrilateral depicted at the 


right, the lengths of the diagonals 
are a and b, and meet at an angle 0. 


Show that the area of this quadri- 
lateral is Sab sin 6. (Hint: compute Le 
the area of each triangle, using the 


Law of Sines.) 
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4. In the triangle to the right, show 


1 1 
1 1— 
ee 7 See ~ (where a 


i? = —1) c 


5. Given AABC with C a right angle, let D be the midpoint of [AB] 
and show that AADC is isosceles with AD = DC. 


6. Given AABC with BC =a, CA=b, and AB =c. Let D be the 
midpoint of [BC] and show that AD = $/2(0? + c?) — a’. 


1.2.7 Algebraic results; Stewart’s theorem and Apollonius’ 
theorem 


A 


STEWART’S THEOREM. We are 
given the triangle AABC, together 
with the edge BX, as indicated in 
the figure to the right. Then 


Cc b 


a(p’+rs) =br+c’s. RB 


PROOF. We set 6 = ABC; applying the Law of Cosines to AAX B 
yields 
P+ pe 


dé = 
COs 2pr 


Applying the Law of Cosines to the triangle AB XC gives 
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b2 — 52 — p? 


2ps 


cos@ = 


Equating the two expressions and noting that a = r+s eventually leads 
to the desired result. 


COROLLARY [APOLLONIUS THEO- B 
REM]. We are given the triangle 
AABC, with sides a, b, and c, to- 


gether with the median BX, as in- Cc b 
dicated in the figure to the right. 
Then 


be = 2m? uw 12; A x C 
If b = c (the triangle is isosceles), 4 <4 


then the above reduces to 


m + (a/2)? = 0°. 


This follows instantly from Stewart’s Theorem. 
EXERCISES 
1. Assume that the sides of a triangle are 4, 5, and 6. 


(a) Compute the area of this triangle. 


(b) Show that one of the angles is twice one of the other angles. 


2. (The Golden Triangle) You are 
given the triangle depicted to the 0 
right with AABD ~ ABCA Show 

DC _ V5+1 


h 
ma 
ratio. A D Cc 


, the golden 26 
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3. Let AABC be given with sides a = 11, b= 8, and c= 8. Assume 


that D and E are on side [BC] such that [AD], [AE] trisect BAC. 


Show that AD = AE =6. 


4. You are given the equilateral trian- 


gle with sides of unit length, de- 
picted to the right. Assume also 
that AP = BD = CE = r for 
some positive r < 1. Compute the 
area of the inner equilateral trian- 
gle. (Hint: try using similar trian- 
gles and Stewart’s theorem to com- 
pute AD = BE=CF.) 


1.3 Circle Geometry 


1.3.1 Inscribed angles 


LEMMA. If a triangle AABC is inscribed in a circle with [AB] being a 
diameter, then ACB is a right angle. 


PROOF. The diagram to the right 
makes this obvious; from 26 + 2¢ = 
180, we get 0+ = 90°. 


INSCRIBED ANGLE ‘THEOREM. 
The measure of an angle inscribed 
in a circle is one-half that of the 
inscribed arc. 
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y=90—¢ /2  ,+b=90 


PRoor. We draw a diameter, as 
indicated; from the above lemma, 
we see that 0; + w = 90. This 
quickly leads to ¢; = 20). Similarly 
2 = 205, and we’re done. 


Before proceeding, we shall find 
the following concept useful. We 
are given a circle and points A, B, 
and P on the circle, as indicated 
to the right. We shall say that the 
angle APB opens the arc AB. 
A degenerate instance of this is a 
when B and P agree, in which 8 opens AB 
case a tangent occurs. In this case 

we shall continue to say that the given angle opens the arc AB. 


As an immediate corollary to the Inscribed Angle Theorem, we get 
the following: 


Q 
Q 
<A Nw 
COROLLARY. Two angles which exe — _ 
SS 


open the same are are equal. 


B 
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EXERCISES 


1. In the diagram to the right, the arc 


AB has a measure of 110° and the 
measure of the angle AC’B is 70°. 
Compute the measure of ADB.® 


A 


. Let [AB] be a diameter of the circle C and assume that C is a 


given point. If ACB is a right angle, then C is on the circle C. 


. Let C be a circle having center 


O and diameter d, and let A, B, 
and C' be points on the circle. If 
we set a = BAC, then sina = 
BC/d. (Hint: note that by the 
inscribed angle theorem, BAC = 
POC. What is the sine of POC?) 


. In the given figure AF = FC and 


PE = EG. 


(a) Prove that triangle AF'PA is 
isosceles. 


(b) Prove that AB+ BE = EC. 


. A circle is given with center O. The 


points E, O, B, D, and E are col- 
inear, as are X, A, F, and C’. The 
lines (XC) and (FD) are tangent 
to the circle at the points A and D 
respectively. Show that 


(a) (AD) bisects BAC; 
(b) (AE) bisects BAX. 


E 
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6. Let AABC have circumradius R. Show that 


R(acos A + bcos B + ccos B) 
y) | 


Area AABC = 


where a = BC, b= AC, and c= AB. (See exercise 5, page 17 for 
the corresponding result for the inscribed circle.) 


Circle of Apollonius 


CIRCLE OF APOLLONIUS. Assume that c 4 1 is a constant and that 
A and B are two given points. Then the locus of points 


is a circle. 


PROOF. This is actually a very sim- 
ple application of the Angle Bisec- 
tor Theorem (see also Exercise 1, 
page 16). Let P, and Py lie on the 
line (AB) subject to 

AP, _ AP» A 


PB BP, 


Py Po 


If we let P an arbitrary point also subject to the same condition, then 
from the Angle Bisector Theorem we infer that APP, = P,PB and 
BPP» = 180 — APB. 

This instantly implies that P, PP, is a right angle, from which we con- 
clude (from Exercise 2, page 30 above) that P sits on the circle with 
diameter [P;P2]|, proving the result. 
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1.3.2 Steiner’s theorem and the power of a point 


SECANT-TANGENT THEOREM. We 
are given the a circle, a tangent line 
(PC) and a secant line (PA), where c 
C is the point of tangency and where 
[AB] is a chord of the circle on the 
secent (see the figure to the right. 


Then Pp A 
PC? = PA~x PB. 


Proor. ‘This is almost. trivial; 
simply note that PCA and ABC 
open the same angle. Therefore, 
APCA ~ APBC, from which the 
conclusion follows. P A 


There is also an almost purely algebraic proof of this result.’ 


The following is immediate. 


"If the radius of the circle is r and if the distance from P to the center of the circle is k, then 
denoting d the distance along the line segment to the two points of intersection with the circle and 
using the Law of Cosines, we have that r? = k? + d? — 2kdcos@ and so d satisfies the quadratic 
equation 


d? — 2kdcos6 +k? — 1? =0. 


The product of the two roots of this equation is k? — d?, which is independent of the indicated angle 
0. 
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COROLLARY. (Steiner’s Theo- 
rem) We are given the a circle, and 
secant lines (PA) and (PC), where 
(PA) also intersects the circle at B P 
and where (PC) also intersects the 
circle at D. 


PAS PB = PU XPD, 


PROOF. Note that only the case 
in which P is interior to the circle 
needs proof. However, since angles 
CBP and PDA open the same are, 
they are equal. Therefore, it follows 
instantly that APDA ~ APBC, 
from which the result follows. 


The product PA x PB of the distances from the point P to the 
points of intersection of the line through P with the given circle is 
independent of the line; it is called the power of the point with 
respect to the circle. It is customary to use signed magnitudes here, 
so that the power of the point with respect to the circle will be negative 
precisely when P is inside the circle. Note also that the power of the 
point P relative to a given circle C is a function only of the distance 
from P to the center of C. (Can you see why?) 


The second case of Steiner’s theorem is sometimes called the “Inter- 
secting Chords Theorem.” 


EXERCISES 


1. In the complex plane, graph the equation |z + 16] = 4|z+ 1]. How 
does this problem relate with any of the above? 
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2. Prove the “Explicit Law of Sines,” 
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namely that if we are given the tri- 
angle AABC with sides a, b, and c, 
and if R is the circumradius, then 


A 
= = = = - = 2R. 
snA sinB sinc 
Conclude that the perimeter of the 
triangle is C 
a+b+c = 2R(sin A+sin B+sinC). 


. Let a circle be given with center O and radius r. Let P be a given 


point, and let d be the distance OP. Let | be a line through P 
intersecting the circle at the points A and A’. Show that 


a) If P is inside the circle, then PA x PA! =r? — d?. 
(a) ’ 


(b) If P is outside the circle, then PA x PA! = d? — 1’. 


Therefore, if we use sensed magnitudes in defining the power of P 


relative to the circle with radius r, then the power of P relative to 


this circle is always d? — r?. 


. Given the circle C and a real number p, describe the locus of all 


points P having power p relative to C. 


. Let P be a point and let C be a circle. Let A and A’ be antipodal 


points on the circle (ie., the line segment [AA’] is a diameter of 
C). Show that t the power of P relative to C is given by the vector 


dot product PA. PH (Hint: Note that if O is the center of C, 
then PA=PO + OA and PA'=PO — OA. Apply exercise 3.) 
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6. Prove Van Schooten’s theorem. 
Namely, let AABC be an equilat- 
eral triangle, and let C be the cir- M 
cumscribed circle. Let M € C 
be a point on the shorter arc BC. A 
Show that AM = BM+CM. 
(Hint: Construct the point D sub- Cc 
ject to AM = DM and show that 
AABM = AACD.) 


7. The figure to the right shows the 
triangle AABC inscribed in a cir- 
cle. The tangent to the circle at 
the vertex A meets the line (BC) 
at D, the tangent to the circle at B 
meets the line (AC) at EF, and the 
tangent to the circle at C' meets the 
line (AB) at F. Show that D, E, 
and F are colinear. (Hint: note 
that AACD ~ ABAD (why?) and 
from this you can conclude that 


DB = con How does this help?) ¢ 


1.3.3. Cyclic quadrilaterals and Ptolemy’s theorem 


As we have already seen, any triangle can be incribed in a circle; this 
circle will have center at the circumcenter of the given triangle. It is 
then natural to ask whether the same can be said for arbitrary polygons. 
However, a moment’s though reveals that this is, in general false even 
for quadrilaterals. A quadrilateral that can be incribed in a circle is 
called a cyclic quadrilateral. 
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THEOREM. The quadrilateral ABCD is cyclic if and ae 
only if 


ABC+CDA = CAB+ BCD = 180°.(1.1) im 


In other words, both pairs of opposite angles add to 
180°. eee 


Proor. If the quadrilateral is cyclic, the result follows easily from 
the Inscribed Angle theorem. (Draw a picture and check it out!) Con- 
versely, assume that the condition holds true. We let C be circumscribed 
circle for the triangle AABC. If D were inside this circle, then clearly 
we would have ABC +CDA > 180°. If D were outside this circle, then 
ABC + CDA < 180°, proving the lemma. 


The following is even easier: 


La ‘ 
; 
THEOREM. The quadrilateral ABCD is cyclic > 


if and only if DAC = DBC. 


PROOF. The indicated angles open the same arc. The converse is also 
(relatively) easy. 


Simson’s line (Wallace’s line). There is another line that can be natu- 
rally associated with a given triangle AABC, called Simson’s Line (or 
sometimes Wallace’s Line), constructed as follows. 


Given the triangle AABC, construct the circumcenter C and arbi- 
trarily choose a point P on the circle. From P drop perpendiculars 
to the lines (BC), (AC), and (AB), calling the points of intersection 
X,Y, and Z, as indicated in the figure below. 
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THEOREM. The points X, Y, and 
Z, constructed as above are colin- 
ear. The resulting line is called 
Simson’s line (or Wallace’s line) 


of the triangle AABC. 


A ZB 


37 


Proor. Referring to the diagram we note that PZB and PXB are 
both right angles. This implies that XPZ + ZBX = 180° and so the 
quadrilateral PX BZ is cyclic. As a result, we conclude that PXZ = 
PBZ. Likewise, the smaller quadrilateral PXCY is cyclic and so 


PCA = PCY = PXY. Therefore, 


PX, = 


PBZ 
PBA 
PCA 
PCY 
PXY 


? 


(angles open the same arc) 


which clearly implies that X, Y, and Z are coliner. 


PTOLEMY’S THEOREM. If the quadri- 
then the B 
product of the two diagonals is equal 
to the sum of the products of the op- 


lateral ABCD is cyclic, 


poside side lengths: 


AC: BD=AB-CD+AD- BC. 


When the quadrilateral is not cyclic, then 


AC:-BD<AB-CD+AD.- BC. 
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PROOF. Whether or not the E 
quadrilateral is cyclic, we can con- 
struct the point E so that ACAD 
and ACEB are similar. This im- 
mediately implies that 
CE CB _ BE 
CA CD DA’ 
from which we obtain 
CB-DA 


BE = ——. 1,2 
OD (1.2) 4 
D 
Also, it is clear that ECA = BCD; since also 
CD _CB 
CA CE’ 
we may infer that AEC'A ~ ABCD. Therefore, 
EA _CA 
BD CD 
forcing 
CA-DB 
EA = ——. ie 
CD ve) 


If it were the case that ABCD were cyclic, then by (1.1) we would 
have 


CBE + ABC = CDA+ ABC = 180°. 
But this clearly implies that A, B, and EF are colinear, forcing 


EA = AB+BE 
Using (1.2) and (1.3) we get 


CA-DB CB-.DA 
Ce am. * 
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proving the first part of Ptolemy’s theorem. 


Assume, conversely, that ABC'D is not cyclic, in which case it follows 
that 


CBE+ABC = CDA+ ABC F 180°. 


This implies that the points A, B, and EF form a triangle from which 
it follows that EA < AB+ BE. As above we apply (1.2) and (1.3) and 
get 


CA-DB CB-.DA 
ep Bie : 


and so 


CA-DB < AB-CD+4CB-DA, 


proving the converse. 


COROLLARY. (The Addition Formulas for Sine and Cosine) We 
have, for angles a and 3, that 


sin(a+3) = sina cos 6+sin 6 cosa; cos(a+) = cosa cos b—sin a sin (. 


PROOF. We shall draw a cyclic quadri- 
lateral inside a circle having diameter 
AC = 1 (as indicated), and leave the de- 
tails to the reader. (Note that by Exer- 
cise 3 on page 30, we have that BD = 
sin(a + 3) (see the figure). To obtain 
the addition formula for cos, note that 
cosa = sin(a + 7/2).) 
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EXERCISES 


1. [AB] and [AC] are chords of a circle with center O. X and Y are 
the midpoints of [AB] and [AC], respectively. Prove that O, X, A, 
and Y are concyclic points. 


2. Derive the Pythagorean Theorem from Ptolemy’s theorem. (This 
is very easy!) 


3. Derive Van Schooten’s theorem (see page 35) as a consequence of 
Ptolemy’s theorem. (Also very easy!) 


4. Use the addition formula for the sine to prove that if ABCD is a 
cyclic quadrilateral, then AC-BD = AB-DC+AD- BC. 


5. Show that if ABCD is a cyclic quadrilateral with side length 
a, b, c, and d, then the area K is given by 


where s = (a+b+c+4d)/2 is the semiperimeter.® 


1.4 Internal and External Divisions; the Harmonic 
Ratio 


The notion of internal and exter- 
nal division of a line segment [AB] 
is perhaps best motivated by the 
familiar picture involving internal 
and external bisection of a trian- 
gle’s angle (see the figure to the 


right). Referring to this figure, we say that the point X divides the 
segment [AB] internally and that the point Y divides the segment 
[AB] externally. In general, if A, B, and X are colinear points, we 


’This result is due to the ancient Indian mathematician Brahmagupta (598-668). 
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AX 
set A; X;B = XB (signed magnitudes); if A; X;B > 0 we call this 
quantity the internal division of [AB], and if A;X;B < 0 we call 
this quantity the external division of [AB]. Finally, we say that the 
colinear points A, B, X, and Y are in a harmonic ratio if 


Al Ki) = ALY Be 


that is to say, when 


AX AY 
XB -YB (signed magnitudes). 

It follows immediately from the Angle Bisector Theorem (see page 15) 
that when (BX) bisects the interior angle at C in the figure above and 
(BY ) bisects the exterior angle at C, then A, B, X, and Y are in har- 
monic ratio. 


Note that in order for the points A, B, X, and Y be in a harmonic 
ratio it is necessary that one of the points X, Y be interior to [AB] and 
the other be exterior to [AB]. Thus, if X is interior to [AB] and Y is 
exterior to [AB] we see that A, B, X, and Y are in a harmonic ratio 
precisely when 


Internal division of [AB] by X = —External division of [AB] by Y. 


EXERCISES 


1. Let A, B, and C be colinear points with (A; B;C)(B;A;C) = —-1. 
Show that the golden ratio is the positive factor on the left-hand 
side of the above equation. 


2. Let A, B, and C be colinear points and let 1 = A; B;C. Show 
that under the 6=3! permutations of A, B, C, the possible values 
of A; B;C are 


1 1+A 
1+’ A 7 1L4+2X 


1 
d 1+. 
) Ay ( ), 
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. Let A, B, X, and Y be colinear points. Define the cross ratio by 


setting 


AX YB 


ABA Se 
Ae tae =P ay ee 


(signed magnitudes). 


Show that the colinear points A, B, X, and Y are in harmonic 
ratio if [A, B; X,Y] = —1. 


. Show that for colinear points A, B, X, and Y one has 


[A,B X,Y] = [X,Y; A, B] =[B, A; Y, X] = [¥, X; B, Al. 


Conclude from this that under the 4! = 24 permutations of A, B, X, 
and Y, there are at most 6 different values of the cross ratio. 


. Let A, B, X, and Y be colinear points, and set ’ = [A, B; X,Y]. 


Show that under the 4! permutations of A, B, X, and Y, the pos- 
sible values of the cross ratio are 


il 1 1 As Ped 
dr’ " 1-A rA-1 A 


_ If A, B, X, and Y are in a harmonic ratio, how many possible 


values are there of the cross ratio |A, B; X, Y] under permutations? 


. Let A and B be given points. 


(a) Show that the locus of points {M| MP = 3MQ} is a circle. 


(b) Let X and Y be the points of intersection of (AB) with the cir- 
cle described in part (a) above. Show that the points A, B, X, 
and Y are in a harmonic ratio. 


. Show that if [A, B; X,Y] =1, then either A= Bor X =Y. 


. The harmonic mean of two real numbers is a and 6 is given by 


2ab 
a+b 


. Assume that the points A, B, X, and Y are in a harmonic 
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ratio. Show that AB is the harmonic mean of AX and AY.” 


10. The figure to the right depicts two 


11. 


circles having an orthogonal in- 
tersection. (What should this 


mean?) Relative to the diagram to A 


the right (O and O’ are the centers), 
show that A, C, B, and D are ina 
harmonic ratio. 


The figure to the right shows a 
semicircle with center O and di- 
ameter XZ. The segment [PY] 
is perpendicular to [XZ] and the 
segment [QY]| is perpendicular to 
[OP|. Show that PQ is the har- 
monic mean of XY and Y Z. 


1.5 The Nine-Point Circle 


(\c0 \ 
Vy 


One of the most subtle mysteries of Euclidean geometry is the existence 
of the so-called “nine-point circle,” that is a circle which passes through 


nine very naturally pre-prescribed points. 


To appreciate the miracle which this presents, consider first that 


arranging for a circle to pass through three noncollinear points is, of 
course easy: this is the circumscribed circle of the triangle defined by 
these points (and having center at the circumcenter). That a circle will 
not, in general pass through four points (even if no three are colinear) 


°The harmonic mean occurs in elementary algebra and is how one computes the average rate at 
which a given task is accomplished. For example, if I walk to the store at 5 km/hr and walk home 
at a faster rate of 10 km/hr, then the average rate of speed which I walk is given by 


2x5x10 20 
5+ 10 


=i km/hr. 


Go 
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we need only recall that not all quadrilaterals are cyclic. Yet, as we 
see, if the nine points are carefully—but naturally—defined, then such 
a circle does exist! 


THEOREM. Given the triangle AABC, construct the following nine 
points: 


(i) The bases of the three altitudes; 


Nine-Point Circle 
(ii) The midpoints of the three sides; 
(iii) The midpoints of the segments join- 


ing the orthocenter to each of the 
vertices. 


Then there is a unique circle passing through these nine points. 


PROOF. Refer to the picture below, where A, B, and C are the vertices, 
and X, Y, and Z are the midpoints. The midpoints referred to in (iii) 
above are P, Q, and R. Finally, O is the orthocenter of AABC. 


By the Midpoint Theorem (Exercise 3 on page 6 applied to AACO, the 
line (Y P) is parallel to (AX’). Similarly, the line (YZ) is parallel to 
(BC). This implies immediately that ZPY Z is aright angle. SImilarly, 
the Midpoint Theorem applied to AABC and to AC’BO implies that 
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(XZ) and (AC) are parallel as are (PX) and (BY’). Therefore, ZPX Z 
is a right angle. By the theorem on page 35 we conclude that the 
quadrilateral Y PX Z is cyclic and hence the corresponding points all 
lie on a common circle. Likewise, the quadrilateral PX ZZ’ is cyclic 
forcing its vertices to lie on a common circle. As three non-collinear 
points determine a unique circle (namely the circumscribed circle of 
the corresponding triangle—see Exercise 8 on page 17) we have already 
that P, X, Y, Z, and Z’ all lie on a common circle. 

In an entirely analogous fashion we can show that the quadrilaterals 
YXQZ and Y X ZR are cyclic and so we now have that P, Q, R, X, Y, Z, 
and Z’ all lie on a common circle. Further analysis of cyclic quadrilat- 
erals puts Y’ and Z’ on this circle, and we’re done! 


Note, finally, that the nine-point circle of A ABC lies on this trian- 
gle’s Euler line (see page 22). 


EXERCISES. 


1. Prove that the center of the nine-point circle is the circumcenter 
of AXY Z. 


2. Referring to the above diagram, prove that the center of the nine- 
point circle lies at the midpoint of the segment [NO], where N is 
the orthocenter of AABC. 
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3. Given AABC, let O be its orthocenter. Let C be the nine-point 
circle of AABC, and let C’ be the circumcenter of AABC. Show 
that C bisects any line segment drawn from O to C’. 


1.6 Mass point geometry 


Mass point geometry is a powerful and useful viewpoint particularly 
well suited to proving results about ratios—especially of line segments. 
This is often the province of the Ceva and Menelaus theorems, but, as 
we'll see, the present approach is both easier and more intuitive. 


Before getting to the definitions, 
the following problem might help 
us fix our ideas. Namely, con- 
sider AABC with Cevians [AD] 
and [CE] as indicated to the right. 
Assume that we have ratios BE : 
HA= 324 andi CDs DB = 05,7 
Compute the ratios EF : FC and 
DF PRA: 


Both of the above ratios can be computed fairly easily using the con- 
verse to Menalaus’ theorem. First consider AC'BE. From the converse 
to Menelaus’ theorem, we have, since A, F, and D are colinear, that 
(ignoring the minus sign) 


x —— 
FC’ 
forcing EF: FC = 10: 7. 


Next consider AABD. Since the points E, F, and C are colinear, 
we have that (again ignoring the minus sign) 
4 7 DF 


1= 
BoA? 
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andso DF: FA=3:14. 


Intuitively, what’s going on can be viewed in the following very tan- 
gible (i.e., physical) way. Namely, if we assign “masses” to the points 
of AABC, say 


3 
A has mass 5) B has mass 2; and C' has mass 5, 


then the point FE is at the center of mass of the weighted line segment 
[AB] and has mass =, and D is at the center of mass of the weighted line 


segment [BC] and has mass 7. This suggests that F’ should be at the 
center of mass of both of the weighted line segments [CE] and [AD], and 
should have total mass a This shows why DF’: FA = Sesto GA 
and why EF: FC =5:£=10:7. 


We now formalize the above intuition as follows. By a mass point 
we mean a pair (n, P)—usually written simply as nP—where n is a 
positive number and where P is a point in the plane.!? We define an 
addition by the rule) mP + nQ = (m+ n)R, where the point R is 
on the line segment [PQ], and is at the center of mass inasmuch as 
PR: RQ=n:m. We view this as below. 


n m 
mP (m-+n)R nQ 


It is clear that the above addition is commutative in the sense that 
xP +yQ=yQ+a«P. However, what isn’t immediately obvious is that 
this addition is associative, i.e., that rP+(yQ+zR) = («P+yQ)+zR 
for positive numbers x, y, and z, and points P, Q, and R. The proof is 
easy, but it is precisely where the converse to Menelaus’ theorem comes 
in! Thus, let 


yQ+2R=(yt2z)S, cP +yQ= (r+ y)T. 


Let W be the point of intersection of the Cevians [PS] and [RT]. 


10 Actually, we can take P to be in higher-dimensional space, if desired! 
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yQ 


(x+y)T 


(y+z)S 


zR 


Applying the converse to Menelaus’ theorem to the triangle APQS, we 
have, since T, W, and R are colinear, that (ignore the minus sign) 


_PT _ QR SW Hy yte, SW 
 TQ° RS” WP y WP’ 


This implies that PW: WS = (y+) : x, which implies that 


(c+ytz2)W=e¢P4+(y+z)S =acP+ (yQ+2zR). 


Similarly, by applying the converse of Menelaus to AQRT, we have 
that (cty+z2)W = («#+y)T+2R = (xP +yQ) +R, and we're done, 
since we have proved that 


tP+(yQ+ ZR) =(e¢+y4+2)W = («Pt yQ)4+ 2R. 


The point of all this is that given mass points xP, yQ, and zR, we 
may unambiguously denote the “center of mass” of these points by 
writing P+ yQ+ ZR. 


Let’s return one more time to the example introduced at the begin- 
ning of this section. The figure below depicts the relevant information. 
Notice that the assigments of masses to A, B, and C’ are uniquely de- 
termined up to a nonzero multiple. 
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The point F is located at the center of mass—in particular it is on the 


line segments [AD] and [CE]; furthermore its total mass is 4. As a 
result, we have that AF: FD =7:2=14:3and CF: FE={:5= 


14: 10, in agreement with what was proved above. 


We mention in passing that mass point geometry can be used to 
prove Ceva’s theorem (and its converse) applied to AABC' when the 
Cevians [AX], [BY], and [CZ] meet the triangle’s sides [BC], [AC], 
and [AB], respectively. If we are given that 


AZ) DK, ..AGY 
x x aiken 

ZB XC YA 
we assign mass ZB to vertex A, mass AZ to vertex B, and mass i 
to vertex C. Since ZB : ALEX = , we see that the center of mass 
will lie on the intersection of the three Cevians above. Conversely, 
if we’re given the three concurrent Cevians [AX], [BY], and [CZ], 
then assigning masses as above will place the center of mass at the 
intersection of the Cevians [AX] and [CZ]. Since the center of mass is 
also on the Cevian [BY], we infer that 


CY  ZB-XC 
YA AZ-BX’ 


and we’re done! 


We turn to a few examples, with the hopes of conveying the utility 
of this new approach. We emphasize: the problems that follow can 
all be solved without mass point geometry; however, the mass point 
approach is often simpler and more intuitive! 
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EXAMPLE 1. Show that the medians of (ABC are concurrent and the 


point of concurrency (the centroid) divides each median in a ratio of 
wide 


SOLUTION. We assign mass 1 to each of the points A, B, and C, giving 
rise to the following weighted triangle: 


1B 


The point G, begin the center of mass, is on the intersection of all three 
medians—hence they are concurrent. The second statement is equally 
obvious as AG : GD = 2: 1; similarly for the other ratios. 


EXAMPLE 2. In AABC, D is the midpoint of [BC] and E is on |AC] 
with AE: EC = 1: 2. Letting G be the intersections of the Cevians 
[AD] and [BE], find AG: GD and BG: GE. 


SOLUTION. The picture below tells the story: 


1B 


2D 


aL | 


3E 
1c 


From the above, one has AG: GD = 1:1, and BG: GE =3:1. 
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EXAMPLE 3. Prove that the angle bisectors of AABC are concurrent. 


PrRooF. Assume that AB = c, AC = b, BC = a and assign masses 
a, b, and c to points A, B, and C, respectively. We have the following 
picture: 


cc 


Note that as a result of the Angle Bisector Theorem (see page 15) each 
of the Cevians above are angle bisectors. Since the center of mass is on 
each of these Cevians, the result follows. 


The above applications have to do with Cevians. The method of 
mass point geometry also can be made to apply to transversals, i.e., 
lines through a triangle not passing through any of the vertices. We 
shall discuss the necessary modification (i.e., mass spltting) in the 
context of the following example. 


SOLUTION. The above examples were primarily concerned with com- 
puting ratios along particular Cevians. In case a transversal is in- 
volved, then the method of “mass splitting” becomes useful. To best 
appreciate this, recall that if in the triangle AABC we assign mass a 
to A, b to B, and c to C, then the center of mass P is located on the 
intersection of the three Cevians (as depicted below). However, sup- 
pose that we “split” the mass b at B into two components b = b; + bg, 
then the center of mass P will not only lie at the intersection of the 
concurrent Cevians, it will also lie on the transversal |X Z]; see below: 
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(bi+b2)B (at+b+c)P= 
(at+by)Q+(ba+cR 


Note that in the above diagram, QP : PR = (by) +c) : (a+,) because 
P is the center of mass of [QR]. 


EXAMPLE 4. In the figure below, compute EF: FD and BF: FG. 


SOLUTION. We shall arrange the masses so that the point F' is the 
center of mass. So we start by assigning weights to A and B to obtain 
a balance [AB] at FE: clearly, assigning mass 4 to B and 3 to A will 
accomplish this. Next, to balance [AC] at G we need to assign mass 
a to C. Finally, to balance [BC] at D, we need another mass of Z at 
B, producing a total mass of 4 + - at B. The point F' is now at the 
center of mass of the system! See the figure below: 
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18 
4B+;,B 


From the above, it’s easy to compute the desired ratios: 


EF:FD=2?:7=9:35 and BF: FG=2: 3 =75:79. 


EXERCISES 


1. n AABC, D is the midpoint of [BC] and EF is on [AC] with 
AE: EC =1: 2. Let G be the intersection of segments [BE] and 
[AD] and find AG: GD and BG: GE. 


2. In AABC, D is on [AB] with AD = 3 and DB = 2. E is on [BC] 
iwht BE = 3 and EC = 4. Compute EF: FA. 


3. In quadrilateral ABCD, E, F, G, and H are the trisection points 
of [AB], [BC], [CD], and DA nearer A, C,, C, and A, respectively. 
Show that EFGH is a parallogram. (Show that the diagonals 
bisect each other.) 


4. Let [AD] be an altitude in AABC, and assume that 7B = 45° 
and ZC’ = 60°. Assume that F' is on [AC] such that [BF] bisects 
ZB. Let E be the intersection of [AD] and [BF] and compute 
AE: ED and BE: EF. 


54 CHAPTER 1 ADVANCED EUCLIDEAN GEOMETRY 


5. In triangle ABC, point D is on [BC] with CD = 2 and DB =5, 
point F is on [AC] with CE = 1 and EA = 3, AB = 8, and [AD] 
and [BE] intersect at P. Points Q and R lie on [AB] so that [PQ] 
is parallel to [CA] and [PR] is parallel to [CB]. Find the ratio of 
the area of APQR to the area of AABC. 


6. In AABC, let E be on [AC] with AE : EC = 1: 2, let F be 
on [BC] with BF : FC = 2: 1, and let G be on [EF] with 
EG: GF =1: 2. Finally, assume that D is on [AB] with C, D, G 
colinear. Find CG: GD and AD: DB. 


7. In AABC, let E be on [AB] such that AE : EB =1: 3, let D be 
on [BC] such that BD : DC = 2: 5, and let F be on [ED] such 
that EF: FD =3: 4. Finally, let G be on [AC] such that the 
segment |BG] passes through F’. Find AG: GC and BF: FG. 


8. You are given the figure to the right. 


(a) Show that BJ: JF =3:4 and 
AJ + JH =671. 

(b) Show that 
DK KE LC = A 
BJtIK te KA= 
UW d6 dod Se SOs O; 


(c) Show that the area of AJ KL is one-seventh the area of AABC. 


(Hint: start by assigning masses 1 to A, 4 to B and 2 to C.) 


9. Generalize the above result by replacing “2” by n. Namely, show 
that the area ratio 


area AJKL : area AABC = (n —1)?: (n® — 1). 


(This is a special case of Routh’s theorem .) 


"This is essentially problem #13 on the 2002 AMERICAN INVITATIONAL MATHEMATICS EXAMI- 
NATION (II). 


Chapter 2 


Discrete Mathematics 


2.1 Elementary Number Theory 


While probably an oversimplication, “number theory” can be said to 
be concerned with the mathematics of the ordinary whole numbers: 


0, +1,+2,.... 


We shall, for convenience denote the set of whole numbers by Z. 


Notice that the famous Fermat conjecture! falls into this context, 
as it asserts that 


For any integer n > 3, the equation 


has no solution with x,y,z € Z 
with x, y, z #0. 


Of course, the assertion is false with n = 1 or 2 as, for instance, 37-+4? = 
5S 


lwhich was proved by Andrew Wiles in 1995 
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2.1.1 The division algorithm 


Very early on, students learn the arithmetic of integers, namely, that 
of addition, subtraction, multiplication, and division. In particular, 
students learn (in about the fifth or sixth grade) that a positive integer 
a can be divided into a non-negative integer b, resulting in a quotient 
q and a remainder r: 


b=qat+r, O<r<a. 


For instance, the following division of 508 by 28 should serve as an 
ample reminder. 


508 
28 [508 
28 
228 
224 

4 


In this case the quotient is 18 and the remainder is 4: 
508 = 18-2844. 


The fact that the above is always possible is actually a theorem: 


THEOREM. (Division Algorithm) Let a, b € Z, where a > 0, 6b > 0. 
Then there exist unique integers gq and r such that 


b=qa+r, where 0<r<a. 


ProoF. Let S be the following subset of the set Z of integers: 


S = {b-—2a|x€ Zand b—-za> 0}. 
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Now let r be the smallest element of this set; note that r > 0, and let 
q be defined so that r = b— qa. Therefore, we already have b = qa+r. 
Notice that if r < a, then we may set r’ = r —a > 0 and so 


r=r—-a—-a=b-—qa—-a=b-(q4+l)a. 


We see, therefore, that r’ € S; since r’ > 0 this contradicts our choice 
of r in the first place! 


Next, we shall show that the quotient and remainder are unique. There- 
fore, assume that 


b=qat+r=datr’, where 0<7r,1r' <a. 


Therefore we conclude that (q—q')a =r'—r. Since 0 < r’,r < a we 
see that |r’—r| < aand so |(q—q’) a| = |r’ —r| < a which clearly forces 
q—q' = 0. But then r’ = r and we're done!” 


In the above, if r = 0, and so b = qa, we say that a divides b and 
write a|b. 


If a,b € Z and not both are 0, we say that the integer d is the 
greatest common divisor of a and b if 


(i) d>0 
(ii) d|a and d|b, 
(iii) if also d’|a and d'|b and if d’ > 0, then d’ < d. 
EXAMPLE. In small examples, it’s easy to compute the greatest com- 


mon divisor of integers. For example, the greatest common divisor of 
24 and 16 is easily seen to be 4. In examples such as this, the greatest 


?The assumption in the theorem that a and b are both non-negative was made only out of 
convenience. In general, the division algorithm states that for two integers a, b € Z, with a # 0, 
there exist unique integers q and r such that 


b=qa+r, where0<r< |al. 
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common divisor is typically obtained by factoring the given numbers 
into prime factors. However, there is an even more efficient approach, 
based on the “Euclidean trick” and on the “Euclidean algorithm.” 


THEOREM. (The Euclidean Trick) Let a, b € Z, not both zero. Then 
the greatest common divisor d of a and b exists. Furthermore, d has 
the curious representation as 


d = sa+tb, 


for suitable integers s and t. 


PROOF. Consider the set 


S = {xa+yb| x,y € Zand xra+ yb > O}, 


and let d be the smallest integer in S' (so d > 0), and let d = sa + tb. 
Since the greatest common divisor of |a| and || is clearly the same as 
the greatest common divisor of a and b, we may as well just assume 
that a and 6 are both positive. Apply the division algorithm and divide 
d into both a and b: 


a=qd+r, b=qdt+re, O<17r1,1%<d. 


But then r;) = a—qd = a—q(sa+tb) = (1—q) a—qtb, we see that if 
r, > 0, then r; € S, which is impossible since r; < d, and d was taken 
to be the smallest element of S. Therefore, we must have that r; = 0, 
which means that d|a. Similarly, r2 = 0 and so d| b. If d’ were another 
positive integer which divides a and b, then a = md’ and b = nd’, and 
sod = sa+ tb = s(md’) + t(nd’) = (sm +tn)d’ which clearly forces 
d'|d and so d' < d. 


NOTATION: We shall denote the greatest common divisor of a and b 
by gcd(a, b). 


CorROLLARY. If d = gcd(a,b) and if d’ is any integer satisfying d' | a 
and d' |b, then also d’ | d. 


PROOF. This is easy! There exist integers s and t with sa + tb = d; 
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given that d’ divides both a and b, then obviously d’ divides the sum 
sa+tb=d, ie., d'|d also. 


We shall present the following without a formal proof. The interested 
reader should to able to trace through the steps. 


THEOREM. (The Euclidean Algorithm) Let a and b be integers, and 
assume that a > 0. Perform the following divisions: 


b=qatn, O<m <a. 


If r; = 0 then a|b and so, in fact a = gcd(a,b). Ifr; > 0, divide r; into 
a: 


Q@=@rit+re, O< rT. <7}. 


If rz = 0 then one shows easily that r; = gcd(a, b). If r2 > 0, we divide 
ro Into r,: 


YT, = qQ3r2 +73, OS 73 < 12. 


If rz = 0, then rg = ged(a,b). If r3 > 0, we continue as above, eventu- 
ally obtaining gcd(a,b) as the last nonzero remainder in this process. 
Furthermore, retracing the steps also gives the “multipliers” s and t 
satisfying sa + tb = gcd(a, b). 


EXAMPLE. To compute gcd(84, 342) we can do this by factoring: 84 = 
6-14 and 342 = 6-57 from which we get gcd(84, 342) = 6. However, if 
we apply the Euclidean algorithm, one has 

342 = 4-84+6, 

84= 16-6+0. 

Therefore, again, 6 = gcd(84, 342). However, we immediately see from 
the first equation that 6 = 1 - 342 — 4-84, so we can take s = 1 and 
t= —4., 


Let a and 6 be integers. We say that the positive integer / is the 
least common multiple of a and ), if 


(i) J is a multiple of both a and 6, 
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(ii) If U’ is a positive multiple of both a and 6 then! < I’. 


We denote the least common multiple of a and b by lem(a, b). 


Assume that a and 6 are integers satisfying gcd(a,b) = 1. Then we 
say that a and 6 are relatively prime. We say that the integer p > 1 
is prime if the only positive divisors of p are 1 and p itself. Note that 
if p is prime and if a is any integer not divisible by p, then clearly p 
and a are relatively prime. 


LEMMA. Assume that a and b are relatively prime integers and that 
the integer a| bc for some integer c. Then, in fact, a|c. 


PROOF. We have that for some integers s,t € Z that sa+ tb = 1. 
Therefore sac + thc = c. Since a| bc, we have bc = qa for some integer 
q, forcing 


c= sac+ tbc = (sc+tq)a 


which says that a|c, as required. 


LEMMA. Assume that a and 6 are relatively prime integers, both di- 
viding the integer 1. Then ab |. 


ProorF. We have that | = bc for a suitable integer c. Since a|/ we have 
that a| bc; apply the above lemma to conclude that a|c, i-e., c = ar for 
some integer r. Finally, 1 = bc = bar which says that ab| I. 


b 
THEOREM. Given the integers a, b > 0, lem(a,b) = es, 
gcd(a, b) 


b 
Proor. Let d = gcd(a,b) and set / = - Clearly | is a multiple of 
both a and b. Next, if s and ¢ are integers such that sa + tb = d, then 


a a 
s-—_+t-—= 1, proving that a’ = — and D! = — are relatively prime. 


From this we may conclude that at least one of the pairs (d, a’) or (d, 0’) 
is relatively prime. Assume that gcd(d,a’) = 1 and let d’ = ged(a’, b). 
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Then d’|a’ and d’|6 and so clearly d'|d. But then d’ divides both a’ 
and d, forcing d’ | gcd(d,a’), i.e., d’ = 1. That is to say, a’ and b are 
relatively prime. Therefore if /' is any multiple of a and 6b then I’ is 
a multiple of a’ and b; since a’ and b are relatively prime, we have, 
by the above lemma, that a‘b|l'. In other words, || 1’, proving that 
k= Tein ab). 


EXERCISES 


1. Assume that a and 6 are integers and that d > 0 is an integer 
dividing both a and b. Show that if for some integers s, t € Z we 
have d= sa+tb, d = gcd(a, b). 


2. Assume that a and b are integers and that there exist integers 
s,t € Z such that sa+tb = 1. Show that a and 6 are relatively 
prime. 


3. Find gcd(1900, 399), lem(1900, 399). Find an explicit representa- 
tion 
ecd(1900, 399) = 1900s + 399, s, t € Z. 


4. Find ged(2100, 399), lem(2100, 399). Find an explicit representa- 
tion 
gcd (2100, 399) = 2100s + 399t, s, t € Z. 


5. Assume that n is a positive integer and that a, b € Z with gcd(a,n) = 
gcd(b,n) = 1. Prove that gcd(ab,n) = 1. 


6. Assume that p is a prime, a and 0 are integers and that p|ab. Use 
the Euclidean trick to show that either p|a or p|b. 


7. Assume that a and 0 are relatively prime and that a|bc for some 
integer c. Prove that a|c. 


8. Show that for all integers n > 0, 6| n(n + 1)(2n + 1).° 


3Of course, this is obvious to those who know the formula: 


“2 n(n+1)(2n+1) 
ee 
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11. 
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. Let p be a prime and show that for all integers h, 1 <h<p—1, 


D @ Conclude that for any integers x and y, the numbers 


(x + y)? and x? + y? have the same remainder when divided by p. 


Show that the converse of Exercise 9 is also true. Namely, if n 
is a positive integer such that for all integers h, 1 <h <n-—1, 


n 
n| 4 | then n is prime. (Hint: Assume that n isn’t prime, and 
let p be a prime divisor of n. Show that if p” is the highest power 
of p dividing n, then p’~! is the highest power of p dividing (ee) 
Assume that n, m are positive integers and k is an exponent such 
that n|(m* — 1). Show that for any non-negative integer h, 


n|(m?* — 1). 


Assume that you have two measuring vessels, one with a capacity 
of a liters and one of a capacity of 6 liters. For the sake of speci- 
ficity, assume that we have an 8-liter vessel and a 5-liter vessel. 
Using these vessels we may dip into a river and measure out cer- 
tain amounts of water. For example, if I wish to measure exactly 
3 liters of water I could fill the 8-liter vessel, and then from this 
fill the 5-liter vessel; what remains in the 8-liter vessel is exactly 3 
liters of water. 


(a) Using the 8-liter vessel and 5-liter vessel, explain how to mea- 
sure out exactly 1 liter of water. 


(b) Assume that we return to the general situation, viz., where 
we have an a liter vessel and a b-liter vessel. Explain how to 
measure out exactly d liters of water, where d = gcd(a, b). 


Let a and 6 be integers, both relatively prime to the positive integer 
n. Show that ab is also relatively prime to n. 


Here’s a cute application of the Euclidean Algorithm. Let a and 
b be positive integers and let g,, rz, k = 1,2,..., be the sequence 
of integers determined as in the Euclidean Algorithm (page 59). 
Assume that r,, is the first zero remainder. Then* 


4The expression given is often called a simple finite continued fraction. 
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=a I 
qo I 


it 
dm-1 2. aman 
Im 
15. For any positive integer n, let U,, be the set of all integers relatively 
prime to n. Now let m, n be relatively prime positive integers and 
show that 
Ui HUperry: 


16. Let n > 1 be an integer and define the so-called Euler ¢-function 
(or Euler’s totient function) by setting 


o(n) = # of integers m, 1 < m <n which are relatively prime with n. 
Now prove the following. 


(a) If p is prime, then ¢(p) = p—1. 

(b) If p is prime, and if e is a positive integer, then ¢(p*°) = 
pa). 

(c) If m and n are relatively prime, (mn) = ¢(m)é(n). (Hint: 
Try this line of reasoning. Let 1 < k < mn and let rm, rn be 
the remainders of k by dividing by m and n, respectively. Show 
that if ged(h yn). = 1, then-eed( 7, 7m). = sed.) SL: 
Conversely, assume that we have integers 1 < r,, < m and 
1 <r, < n with ged(rm,m) = ged(rn,n) = 1. Apply the 
Euclidean trick to obtain integers 5, 8, Sn, t,tn,tm satisfying 


Shit =, Saline ta = Ly Sha ie HL 


Let k = smr, + tnrm, and let kmn be the remainder obtained 
by dividing k by mn. Show that 1 < km, < mn and that 
gcdkmn,mn) = 1. This sets up a correspondence between the 
positive integers less than mn and relatively prime to mn and 
the pairs of integers less than m and n and relatively prime to 
m and n, respectively. ) 
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(d) Show that for any positive integer n, ¢(n) > Vn/2. (Hint: 
prove that for every prime power p*, where e is a positive 
integer, $(p°) > p°/*, unless p = 2 and e = 1. What happens 
in this case?) 


See the footnote.” 


17. Given the positive integer n, and the positive divisor d|n show that 
the number of integers k satisfying 1 < k < n with gcd(k,n) = d 
is @ (4). Conclude that 


> o(d) =n, 
d|n 


where the above sum is over the positive divisors of n. 
18. Let g and n be positive integers. Show that 


# of integers m, 1 <m< qn = ga 

which are relatively prime withn q 

19. Suppose that x is a positive integer with x = qn+r, 0<r<n. 
Show that 


# of integers m, 1<m<a 
which are relatively prime with n 


go(n) < < (q+ 1)¢(n). 


20. Conclude from Exercises 18 and 19 that 
ae Lim ¢ ) 
o(n) 


i which are relatively prime with n 
YO - = 


>Euler’s ¢-function has an interesting recipe, the proof of which goes somewhat beyond the scope 
of these notes (it involves the notion of “inclusion-exclusion” ). The formula says that for any integer 


n> 1, 
1 
o)=n TT (1-3), 


where the product is taken over prime divisors p of n. A main ingredient in proving this is the result 
of Exercise 17, above. Note that this formula immediately implies that (mn) = ¢(m)¢(n) when m 
and n are relatively prime. 
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2.1.2 The linear Diophantine equation az + by = c 


Suppose that a, b, c are integers and suppose that we wish to find all 
possible solutions of the linear Diophantine equation az + by = c. 
First of all we need a condition on a, b, c in order to guarantee the 
existence of a solution. 


THEOREM. The linear Diophantine equation ax + by = c has a solution 
if and only gcd(a, b) | c. 


PROOF. Set d= gcd(a,b) and assume that c = kd for some integer k. 
Apply the Euclidean trick to find integers s and t with sa + tb = d; 
multiply through by k and get a(sk) + b(tk) = kd = c. A solution is 
therefore « = sk and y = tk. Conversely, assume that ax + by = c. 
Then since d|a and d|b, we see that d| (ax + by), i.e., dlc, proving the 
theorem. 


As the above indicates, applying the Euclidean algorithm will yield 
a solution of the Diophantine equation ax + by = c. We would like now 
to show how to obtain the general solution of this equation, that is to 
find a recipe for generating all possible solutions. Let’s start with a fixed 
solution (29, yo) and let (x,y) be another solution. This immediately 
implies that azo + byo = c, aw + by = c and so a(ap — x) = b(y — yo). 
Setting d = gcd(a,b) we have 


“(0 eo °(y = Ajy): 


a b 
Next, we note that since Fi and 7 are relatively prime, then by 


a a 
Exercise 7 on page 61 we have that 7 divides y— yo, say y—Yo = qe for 


a boa 
some integer t. But then —(%p — x) = —- —t, forcing zp — x = —t. In 
other words, starting with a fixed solution (29, yo) of the Diophantine 


equation ax + by = c we know that any other solution (x, y) must have 
the form 
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b 
z= to — at, y= yor ot teEeZ (2.1) 


Finally, we see (by substituting into the equation) that the above ac- 
tually is a solution; therefore we have determined all solutions of the 
given Diophantine equation. We summarize. 


THEOREM. Given the linear Diophantine equation ax + by = c where 
c is a multiple of d = gcd(a,b), and given a particular solution (29, yo), 
the general solution is given by 


b a 
OO Sa y= yor Gh t é€ Z. 


EXAMPLE. Consider the Diophantine equation 2x” + 3y = 48. 
(i) Find all solutions of this equation. 


(ii) Find all positive solutions, i.e., all solutions (x,y) with x, y > 0. 


SOLUTION. First of all, a particular solution can be found by simple 
inspection: clearly (x, y) = (24,0) is a solution. Next, since 2 and 3 are 
relatively prime we conclude from the above theorem that the general 
solution is given by 


e=24-3t, y=2t, tEZ. 


Next, if we seek only positive solutions then clearly t > 0 and 24—t > 0. 
This reduces immediately to 0 < t < 24, which is equivalent with saying 
that 1 < t < 23. That is, the positive solutions are described by writing 


v=24-3t, y=2t, teZ, 1<t<23. 


EXERCISES 


1. Find all integer solutions of the Diophantine equation 4x” + 6y = 
100. Also, find all positive solutions. 
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2. Find all solutions of 15a” + 16y = 900, with x, y > 0. 


3. Suppose that someone bought a certain number of 39-cent pens 
and a certain number of 69-cent pens, paying $11.37 for the total. 
Find the number of 39-cent pens and the number of 69-cent pens 
purchased. 


4. I recently purchased a number of DVDs at 6¥each and a number 
of DVDs at 7¥each, paying 249¥for the total. Find the number of 
6¥DVDs and the number of 7¥DVDs assuming that I purchased 
approximately the same number of each. 


5. Solve 15x — 24y = 3, x,y > 0. 


6. Farmer Jones owes Farmer Brown $10. Both are poor, and neither 
has any money, but Farmer Jones has 14 cows valued at $184 each 
and Farmer Jones has a large collection of pigs, each valued at 
$110. Is there a way for Farmer Jones to pay off his debt? 


7. A Pythagorean triple is a triple (a,b,c) of positive integers 
such that a? + b? = c?. Therefore, (3,4,5) is an example of a 
Pythagorean triple. So is (6,8,10). Call a Pythagorean triple 
(a,b,c) primitive if a, b, and c share no common factor greater 
than 1. Therefore, (3,4,5) is a primitive Pythagorean triple, but 
(6,8, 10) is not. 


(a) Assume that s and t are positive integers such that 
(1) 4 <8, 
(ii) s and ¢ are relatively prive, and 
(iii) one of s, t is odd; the other is even. 
Show that if r = 2st, y= s?—#?, z = 57+’, then (2, y, z) is 
a Pythagorean triple. 
(b) Show that every Pythagorean triple occurs as in (a), above. 
8. This problem involves a system of Diophantine equations.° Ed and 


Sue bike at equal and constant rates. Similarly, they jog at equal 
and constant rates, and they swim at equal and constant rates. Ed 


SEssentially Problem #3 from the 2008 American Invitational Mathematics Examination. 
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covers 74 kilometers after biking for 2 hours, jogging for 3 hours, 
and swimming for 4 hours, while Sue covers 91 kilometers after 
jogging for 2 hours, swimming for 3 hours, and biking for 4 hours. 
Their biking, jogging, and swimming rates are all whole numbers 
in kilometers per hour. Find these rates. 


2.1.3 The Chinese remainder theorem 


CONGRUENCE AND THE INTEGERS MODULO n. If n is a positive 
integer, and if a and 6 are integers, we say that a is congruent to 
b modulo n and write a = b(mod n) if n|(a — b). Next, we write 
Zn = {On In; 2n,---,(% — 1)n} with the understanding that if b is any 
integer, and if b= qn+r, where 0 <r <n, then b, = r,. Sometimes 
we get lazy and just write Z, = {0,1,2,...,2—1} without writing the 
subscripts if there is no possibility of confusion. As an example, we see 
that Ze = {0, 1, 2,3, 4,5} with such further stipulations as 8 = 2, 22 = 
4, —5 = 1. The integers modulo n can be added (and multiplied) 
pretty much as ordinary integers, we just need to remember to reduce 
the answer modulo n. 


EXAMPLE. We can write out the sums and products of integers modulo 
6 conveniently in tables: 


Pe VO De Ds gh eS Oe Te 2d AD 
0/0 12 3 4 5 Oe G0) 20108 Be 0 
1);1 23 4 5 0 1;/0 12 3 4 5 
2/2 3 45 01 2}0 24 0 2 4 
3/3 4 5 0 1 2 3/0 3 0 3 0 3 
4/4502 3 4 4}0 4 2 0 4 2 
9/5 012 3 4 9/0 5 43 2 1 


The following story’ conveys the spirit of the Chinese Remainder 
Theorem: 


”Apparantly due to the Indian mathematician Brahmagupta (598-670). 
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An old woman goes to market and a horse steps on her basket 
and crushes the eggs. The rider offers to pay for the damages 
and asks her how many eggs she had brought. She does not 
remember the exact number, but she remembered that when 
she had taken them out two at a time, there was one egg left. 
The same happened when she picked them out three, four, 
five, and six at a time, but when she took them seven at a 
time they came out even. What is the smallest number of 
eggs she could have had? 


The solution of the above is expressed by a system of congruences: 
if m is the number of eggs that the old woman had, then 


Expressed in terms of integers modulo n for various n, we can express 
the above as 


Mz = 1e; m3=13; Me= 14; M5 = 15; Me = 16; M7 = 07. 


Note first that there is some redundancy in the above problem. 
Namely, notice that if m4 = 14, then surely mz = 1g. Indeed, 


Maa, => 4|(4-1) 
=> 2|(4-1) 
=>. Is: 


In exactly the same way we see that the condition mg = 1¢ implies that 
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m3 = 13. Therefore, we are really faced with the task of finding the 
smallest integer m satisfying 


m4 = la, m5 = Ls, m6 = 16, m7 = Oz. 


The first question that should occur to the reader is “why is there 
any solution m to the above?” As we’ll see, this will be the point of 
emphasis in the Chinese Remainder Theorem. 


THEOREM. (Chinese Remainder Theorem) Let a and b be positive 
integers, and set d = gcd(a, b). Let x and y be any two integers satifying 
Xq = Ya. Then there is always an integer m such that 


Ma = Xa, Mb = Yb- 


Furthermore, if 1 = lem(a,b) then any other solution m' is congruent 
to m modulo I. 


ProoF. First of all since xq = ya we know that d| (x —y); assume that 
x — y = zd, for some integer z. Next, let s and t be integers satisfying 
sa+tb = d, from this we obtain 


sza+tzb=zd=2—y. 


From this we see that 7 — sza = y + tzb; we now take m to be this 
common value: m = x — sza = y+ tzb from which it is obvious that 
Mig = Be ON Tig, =U 

Finally, if m’ is another solution, then we have m’ = m(mod a) and 
m' = m(mod 6b) and so m’—m is a multiple of both a and b. Therefore 
1|(m' —m) and so m! = m(mod 1!) proving the theorem. 


We'll consider a couple of examples. 
EXAMPLE 1. Solve the simultaneous congruences 


14(mod 138) 
23(mod 855). 


m 


m 
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Applying the Euclidean algorithm yields 


890 = 6-138+4 27 
138 = 5-274+3 
2h = Vesa, 


and so d = gcd(138, 855) = 3; furthermore the above shows that 
3 = 1388 — 5-27 = 138 — 5(835 — 6 - 138) = 31 - 188 — 5 - 855 


(and so s = 31 and t = —5). Also, since 14 = 23(mod 3) we conclude 
that the above congruences can be solved. Indeed, 14 — 23 = —3.-3 
(so z = —3) and so a solution is m = x — sza = 144 31.- 188- 
3 = 12,848. Finally, we can prove that 12,848 is actually the least 
positive integer solution of the above congruences above, as follows. 
To do this, apply the Chinese Remainder Theorem to conclude that 
if m’ is any other solution, and if 1 = lem(138,855) = 39,330, then 
m! = 12, 848(mod 39, 330). This is clearly enough! 


EXAMPLE 2. Find the least positive integer solution of 


m = 234(mod 1832) 
m = 1099(mod 2417). 


This one is technically more involved. However, once one recognizes 
that 2417 is a prime number, we see immediately that 2417 and 1832 
are relatively prime and so at least we know that a solution exists. Now 
comes the tedious part: 


2417 = 1-1832+4 585 
1832 = 3-585+4+ 77 
585 = 7-77+46 
@ = 1-464+31 
46 = 1-31415 
ol Sn 2a lo Ae 


Ip = [o> 1-F0 
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Therefore, d = gcd(1832, 2417) = 1; working backwards through the 
above yields 


157 - 1832 — 119 - 2417 
(so s = 157 and t = —119). We have 234 — 1099 = —865 = z and soa 
solution is given by m = x—sza = 234+ 157-865-1832 = 248, 794,994. 
Finally, any other solution m’ will be congruent to 248,794,994 modulo 
1 = lem(1832, 2417) = 1832 - 2417 = 4,427,944. We therefore reduce 
248,794,994 modulo / using the division algorithm: 


248,794,994 = 56-4, 427,944 = 830, 130, 


and so the least integer solution 1s m = 830, 130. 


EXAMPLE 3. In this example we indicate a solution of three congru- 
ences. From this, the student should have no difficulty in solving more 
than three congruences, including the lead problem in this subsection. 
Find the least positive solution of the congruences 


m = 1(mod 6) 
7(mod 15) 
A(mod 19). 


m 


m 


First, we have 

1-15-—2-6 = 3 
from which we conclude that 3 = gcd(6, 15). Next, we have 7—1 = 2-3, 
and so 


2210 = 2829 02 = esa la 1; 

this tells us to set my = 1-—2-2-6 = 7—2-15 = —23. We have 
already seen that all solutions will be congruent to —23 modulo J; = 
Icm(6,15) = 30. Therefore, the least positive solution will be m, = 7 
(which could have probably more easily been found just by inspection!). 
Note that if m is integer satisfying m = 7(mod 30), then of course we 
also have m = 7(mod 6) and m = 7(mod 15), and so m = 1(mod 6) 
and m = 7(mod 15). Therefore, we need to find the least positive 
integer solution of 
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m 7(mod 30) 


A(mod 19). 


m 
In this case, 7-30 —11-19=1 andso 3-7-30—3-11-19=3=7-4 
which tells us to set m = 7—3-7-30=4-—3-11-19 = —623. Any 
other solution will be congruent to -623 modulo 30-19 = 570; apply 
the division algorithm 


=626 = 22510: alt. 


It follows, therefore, that the solution we seek is m = 517. 


We conclude this section with a simple corollary to the Chinese Re- 
mainder Theorem; see Exercise 16c on page 63. 


COROLLARY TO CHINESE REMAINDER THEOREM. Let m and n be 
relatively prime positive integers, Then ¢(mn) = ¢(m)¢(n). 


PROooF. Let a, 6 be positive integers with 1 <a<m, 1<b<n, and 
gcd(a,m) = gced(b,n) = 1. By the Chinese Remainder Theorem, there 
is a unique integer k, with 1 < k < mn satisfying ky, = a, ky = 6. 
Clearly gcd(k, mn) = 1. Conversely, if the positive integer k is given 
with 1 < k < mn, and ged(k,mn) = 1, then setting a = ky, b = ky 
produces integers satisfying 1 < a< m, 1 < b < n and such that 
gcd(a,m) = ged(b,n) = 1. 


EXERCISES 


1. Let n be a positive integer and assume that a; = b;(mod n) and 
that az = bo(mod n). Show that a, +b; = a2+62(mod n) and that 
a,b; = agbo(mod n). 


2. Compute the least positive integer n such that n = 12,245, 367(mod 11). 
(Hint: this is very easy! Don’t try a direct approach.) 


3. Compute the least positive integer n such that n = 12, 245, 367(mod 9). 


4. Find the least positive integer solution of the congruences 
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= 7(mod 10) 
17(mod 26). 


3 
| 


= 
| 


5. Find the least positive integer solution of the congruences 


m = 7(mod 10) 
= 5(mod 26) 
m = l(mod 12). 


= 
| 


6. Solve the problem of the woman and the eggs, given at the begin- 
ning of this section. 


7. If A and B are sets, one defines the Cartesian product of A and 
B by setting 


Ax B = {(a,b)|aeAand be B}. 


Now suppose that the positive integers m and n are relatively 
prime, and define the function 


i : Ligiy, —> Ling x Ly, by f(a) = (Gate) S Lig, x Lag 


Using the Chinese remainder theorem, show that the function f is 
one-to-one and onto. 


8. § The integer N is written as 
N = 102030205060y 


in decimal (base 10) notation, where x and y are missing digits. 
Find the values of x and y so that N has the largest possible 
value and is also divisible by both 9 and 4. (Hint: note that 
N =-1+2+ y(mod 9) and N = y(mod 4).) 


’This is problem #5 on the January 10, 2008 ASMA (American Scholastic Mathematics Associ- 
ation) senior division contest. 
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2.1.4 Primes and the fundamental theorem of arithmetic 


We have already defined a prime as a positive integer p greater than 
1 whose only positive divisors are 1 and p itself. The following result 
may seem a bit obvious to the naive reader. What I want, though, is 
for the reader to understand the nature of the proof.° 


LEMMA. Any positive integer n > 1 has at least one prime factor. 


PROOF. Denoting by N the set of positive integers, we define the set 


C = {n€N|n>1 and n has no prime factors }. 


Think of C as the set of “criminals;” naturally we would like to show 
that C = ©, i.e., that there are no criminals. If C 4 @, then C has a 
smallest element in it; call it co (the “least criminal”). Since co cannot 
itself be prime, it must have a non-trivial factorization: co = c4c, 
where 1 < c, cj < co. But then, cp, cj ¢ C and hence aren’t criminals. 
In particular, ch has a prime factor, which is then a factor of co. So 
co wasn’t a criminal in the first place, proving that C = @, and we’re 
done! 


Using the above simple result we can prove the possibly surprising 
result that there are, in fact, infinitely many primes. This was known 
to Euclid; the proof we give is due to Euclid: 


THEOREM. There are infinitely many primes. 


PrRooF. (Euclid) Assume, by way of contradiction that there are only 
finitely primes; we may list them: 


P1, P2, -++5Pn- 


Now form the positive integer n = 1+ pypo--: ,pyn. Note that none of 
the primes pj, p2,...,Pn can divide n. However, because of the above 
lemma we know that n must have a prime divisor p ¥ pj, p2,..-,Dn- 


°T will formalize this method of proof in the next section. 
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Therefore the original list of primes did not contain all of the primes. 
This contradiction proves the result. 


Knowing that there are infinitely many primes, one may ask a slighly 
more subtle question, namely whether the infinite series 


1 ce Cees | ik 
ye (5) = gtgtgttat 


primes p 


converges or diverges. One can show that this series actually diverges, 
which shows that the prime numbers are relatively densely packed 
within the set of positive integers. 


There are many unsolved conjectures related to prime numbers; we'll 
just state two such here. The first is related to twin primes which 
are prime pairs of the form p, p+ 2, where both are primes. The first 
few twin primes are 3, 5, 5, 7, 11, 13, and so on. The so-called “Twin 
Prime” conjecture which states that there are an infinite number of 
twin primes. The next is the Goldbach conjecture which states that 
any even integer greater than 2 is the sum of two primes. Neither of 
these conjectures has been proved. 


Using the above method of “criminals”! one eventually arrives at 


the important Fundamental Theorem of Arithmetic: 


THEOREM. (Fundamental Theorem of Arithmetic) Any positive integer 
n > 1 has a unique factorization into primes. In other words 


(i) there exist primes py < po < +++ < p, and exponents 1, €2,...,€, 
such that 


n= pips +++ pr. 


(ii) The factorization above is unique in that ifn = gig? - qi then 
S=T, Pi = M1, P2 = 92,---, Dr = Gr and €; = fi, e2 = fa,...,€r = 


fe 


Now let a and 6 be positive integers greater than 1. Write 


l0My surrogate for mathematical induction 
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a= pipe <p, b= pip ---ph 

be the prime factorization of a and b where some of the exponents 
might be 0. For each i = 1, 2,...,r, let m; = min {e;, f;} and let 
M; = max {e;, f;}. The following should be clear: 


mm. 


gcd(a, b) = pr ps” ae ae Icm(a, b) =P Pp 


Hp ph 

From the above we see that we have two rather different methods 
of finding the greatest common divisor and least common multiple of 
two positive integers. The first is the Euclidean algorithm, which we 
encountered on page 59, and the second is based on the Fundamental 
Theorem of Arithmetic above. On the surface it would appear that the 
latter method is much easier than the former method—and for small 
numbers this is indeed the case. However, once the numbers get large 
then the problem of factoring into primes becomes considerably more 


difficult than the straightforward Euclidean algorithm. 
EXERCISES 


1. Find the prime factorizations of the numbers 


(a) 12500 
(b) 12345 
(c) 24227 


2. Find the factorization of the numbers p*(p—1)?(p+1)(p?+p+1), 
where p = 2, 3, 5, 7. 


3. Compute the gcd and Icm of the following pairs of numbers 


(a) 2090 and 1911 
(b) 20406 and 11999 
(c) 2°41 and 2-1. 


4. Show that if p is a prime, then p+1 and p?+p+1 must be relatively 
prime. Find integers s and t such that s(p+1)+t(p?+p+1) =1. 


78 


. For each positive integer n, define 
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5. Show that there exist unique positive integers x and y satisfying 


x? + 84x + 2008 = y?. Find these integers." 


i ee | ii 

2° 3 n 
Prove that ifn > 2, then H(n) is not 
an integer. 


(Hint: Let k be the largest integer such that 2* <n, and let M be 
the least common multiple of the integers 1, 2,...,2* — 1, 
2* +1,...,n. What happens when you multiply H(n) by M7?) 


. Here’s an interesting system of “integers” for which the Funda- 


mental Theorem of Arithmetic fails. Define the set 
Zih/—5| = {a+ b/—5|a, be Zy}. 
Define primes as on page 60,!? and show that 
3-7 = (14 2/7—5)(1 — 2-5) = (44+ V—5)(4—- V—5) 


give three distinct prime factorizations of 21. In otherwords, the 
uniqueness aspect of the Fundamental Theorem of Arithmetic fails 
to hold in this case. 


. In this exercise we outline another proof that there exist infinitely 


many primes. To this end, define the n-th Fermat number F,,, 
by setting F241. = 0, 1,2 nn dy: 


n—-1 
(a) Show that [[ = fF, —2, n=1,2,... (Induction!) 


m=0 
(b) Conclude from part (a) that the Fermat numbers F;,, and F;, 
are relatively prime whenever m # n. 


"1K ssentially Problem #4 from the 2008 American Invitational Mathematics Examination. 

!2 Actually, in more advanced treatments, one distinguishes between the notion of a “prime” and 
the notion of an “irreducible,” with the latter being defined more or less as on page 60 (I’m trying 
to avoid a systematic discussion of “units”). On the other hand, a number p is called prime if 
whenever p|ab, then p|a or p|b. In the above exercise the numbers given are all irreducibles but, 
of course, aren’t prime. 
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(c) Conclude from part (b) that there must be infinitely many 
primes. 


9. Here’s yet another proof that there are infinitely many primes’ 


We start with the simple observation that for any integer n > 2, n 
and n+ 1 share no prime factors. Therefore, the product n(n + 1) 
must contain at least two distinct prime factors. We now generate 
a sequence of integers as follows. Let 


n= 2-3 


ng = ni(n4 + 1) =A) 
n(n + 1) = 42-43 = 1806 


3 
. w 
I 


What is the minimum number of distinct prime factors contained 
in nz? 


10. For any positive integer n, let T(n) be the number of divisors (in- 
cluding 1 and n) of n. Thus 7(1) = 1, 7(2) = 2, 7(3) = 2, 7(4) = 
3, T(10) = 4, etc. Give a necessary and sufficient condition for 
T(n) to be odd. 


11. Continuation of Exercise 10. For each positive integer n, set 


S(n) = 7(1) + 7(2) +---+7(n). 


Let a be the number of integers n < 2000 for which S(n) is even. 


Compute a. 


2.1.5 The Principle of Mathematical Induction 


In the previous section we showed that every integer n has at least one 
prime factor essentially by dividing the set N into the two subsets: the 
set of all integers n which have a prime factor, and set of those which 
do not. This latter set was dubbed the set of “criminals” for the sake 


13See Filip Saidak, A NEw PRoor OF EUCLID’s THEOREM, Amer. Math. Monthly, Vol. 113, 
No. 9, Nov., 2006, 937-938. 

M4This is a modification of Problem #12 of the American Invitational Mathematics Examination, 
2005 (1). 
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of color. The proof rested on the fact that this set C’ of criminals must 
have a least element, which meant that any positive integer m which 
is less than any element of C cannot be a criminal. 
Before formalizing the above, let’s take up an example of a somewhat 
different nature. Consider the proposition that, for any n € N, one has 
n(n + 1)(2n + 1) 


1? OF he At he P= 
6 


Naturally, for each such n, either the above statement is true or 
false. This allows us to divide N into two subsets: the subset G (for 
“good guys”) of integers n € N for which the above statement is true, 
and the set C' (for “criminals” ) for which the above statement is false. 
Obviously 


N = GUC, and GNC=a@. 


Also — and this is important — note that 1 € G, i.e., ifn = 1, then 
the above statement is easily verified to be true. Put differently, 1 is 
not a criminal; it’s a good guy! 

In order to prove that the above statement is true for all n € N, we 
need only show that C' = @. Thus, let m be the least element of C, 
and note that since 1 ¢ C' we have that m—1€ G: that is to say the 
above statement is valid with n =m — 1. Watch this: 


P+ P+ t-.-tm= 74274372 +---(m—-1P% +m 
=a Ai dl 
= Male HL (m Bay tm? (This is the key step!) 
1 
= —=(2m* —3m?+m+6m?) (This is just algebra.) 


6 
= (oni? Lappe m(m + uae +1) 


Let’s have a look at what just happened. We started with the as- 
sumption that the integer m is a criminal, the least criminal in fact, 
and then observed in the end that 17+ 2?+3?+---+n? = mati ant) 
meaning that m is not a criminal. This is clearly a contradiction! 
What caused this contradiction is the fact that there was an element in 
C’, so the only way out of this contradiction is to conclude that C = ©. 


(A little more algebra. ) 
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Therefore every element n € N is in G, which means that the above 
statement is true for all positive integers n. 

Let’s formalize this a bit. Assume that for each n € N we assign 
a property P(n) to this integer, which may be true or false. In the 
previous section, the relevant propery was 


P(n): mn has at least one prime factor. 


In the example just discussed, 


1)(2 1 
Pn)? APO eso fr AL B = ) 

The point is that once we have a property assigned to each n € N, 
we may consider the set G Ce N of all integers n for which P(n) is 
true, and the set (the criminals) of all integers n for which P(n) is false. 


In trying to establish that C = @, we may streamline our argument via 


PRINCIPLE OF MATHEMATICAL INDUCTION. Let N denote the set of 
positive integers, and assume that for each n € N we have a property 
P(n). Assume that 


(i) P(a) is true, for somea € N. (This “starts” the induction.) 


(ii) Whenever P(m) is true for alla <m <n, (the so-called inductive 
hypothesis) then P(n) is also true. 


Then P(n) is true for all n > a. 


Proor. Let C be the set of all integers > a for which P(n) is false. 
We shall prove that C = @, which will imply that P(n) is true for all 
n € N. By hypothesis (i) above, we see that a ¢ C; therefore, if we 
take n to be the least element of C’, then n 4 a. Therefore, for any 
positive integer m with a < m <n, P(m) must be true. By hypothesis 
(ii) above, we conclude that, in fact, P(n) must be true, which says 
that n ¢ C. This contradiction proves that C' = @, and the proof is 
complete. 


At first blush, it doesn’t appear that the above principle accom- 
plishes much beyond what we were already able to do. However, it 
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does give us a convenient language in which to streamline certain argu- 
ments. Namely, when we consider an integer n for which P(m) is true 
for all m <n, we typically simply say, 


By induction, P(m) is true for all m <n. 


Let’s see how to use this language in the above two examples. 


EXAMPLE 1. Any integer n > 2 has at least one prime factor. 


PROOF. We shall prove this by induction on n > 2. Since 2 is a prime 
factor of itself, we see that the induction starts. Next, assume that n is 
a given integer. If n is prime then, of course, there’s nothing to prove. 
Otherwise, n factors as n = ab, where a and b are positive integers 
satisfying 2 <a, b <n. By induction a has a prime factor, and hence 
so does n. Therefore, by the principle of mathematical induction we 
conclude that every integer n > 2 has a prime factor and the proof is 
complete. 


EXAMPLE 2. For any integer n > 1 one has 
2 n(n+1)(2n +1) 


ieee) as as eeepc) 
6 


PROOF. We prove this by mathematical induction. The above is clearly 
true for n = 1, and so the induction starts. Next, let n be a given 
integer. By induction we assume that the above recipe is valid for all 
positive integers m <n. We compute: 


P+2432t--tn? = 174274374 t(n-—1)? +n? 
—1)\(2n-1 
= ue i eee tn? (by induction) 
— ant IQn+1) 
7 6 


and the proof is complete. 


EXERCISES 
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1. Prove the following: 


(i) 1+34+54+---+(Q2n-1)=n? (n=1,2,...) 

(ii) 19422439 +---+n? = in?(n4+1)? (n=1,2,...) 

ie ol 1 

COs 305 
(Do you really need mathematical induction? Try partial frac- 
tions!) 


(iv) 174 (4) | (+) bes Age (n = 2,3,...) 


2. As in Exercise 6 on page 78 we define, for any positive integer n, 


de Ad 1 


Bisse, ene ees 


2 
Show that for any integer m > 0, that H(2™) > oo 


3. Let n be a positive integer. 


= 1 
(a) Prove that if k is an integer with 0 < k < n, (") eS (" a 


= 1 

é 7 i) (This doesn’t require induction.) 

(b) Prove that if S is a set with n elements, and if 0 < k < n, then 
there are (7) subsets of S with k elements. (Use induction.) 


4. Prove that for alln >1., 12 +23 +---n? =(1+2+3+---+n)?. 


5. Prove that for all n > 1, and for all x > 0, that (1+ 2)" >1+nz2. 
(Is induction really needed?) 


6. Prove the classical inequality 


Ly X92 In 


whenever 2, %,...%, > O and 21+ %9+---%, =1. (Hint: using 
induction, note first that you can arrive at the inequality 
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1 1 1 1 n? 1 
Jp... 4 > ! : 
Zi £9 In On. L-Fny. Tayi 


Next, you need to argue that because 0 < p41 < 1, 


n2 


> (n+1)’; 

Leta a ( ) 

this is not too difficult. Incidently, when does equality occur in 
the above inequality?) 


OF 


7. Prove that for all integers n > 1, 2 SO sina cos ¢ = sin2nz. 


j=l 


n . sin (2"+1y 
8. Prove that for all integers n > 0, sina [[ cos2’x = ele 


j=0 gn+1 
; iL eae 1— cos 2nz 
9. Prove that for all integers n > 0, that S> sin(2j7-—1)z = ————— 
j=l 2sin x 


10. (This is a bit harder.) Prove the partial fraction decomposition 


1 ttt, cp fn\ 1 
1 heen a 0 QBs 


where n is a non-negative integer. 


11. © We shall use mathematical induction to prove that all positive 
integers are equal. Let P(n) be the proposition 


“If the maximum of two positive 
integers is n then the integers are 
equal.” 


P(n) 


Due to T.I. Ramsamujh, THE MATHEMATICAL GAZETTE, Vol. 72, No. 460 (Jun., 1988), p. 
113. 
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Clearly P(1) is true. Assuming that P(n) is true, assume that 
u and v are positive integers such that the maximum of wu and v 
isn+1. Then the maximum of u—1 and v — 1 is n, forcing 
u—1=v-—1 by the validity of P(n). Therefore, u = v. What’s 
wrong with this argument? 


12. If A is a finite subset of real numbers, let 7(A) be the prod- 
uct of the elements of A. If A = @, set 7(A) = 1. Let S, = 
{1,2,3,...,n}, m > 1 and show that 


(a) &. a(A) =n-+1, and that 
(14 
vi Acs, ™(A) ° 


13. If A is a finite subset of real numbers, let c(A) be the sum of the 
elements of A. Let n > 1, and set S,, = {1,2,3,...,n}, as above. 


Show that 
(a) PaentAy + 2n) (nt) (14545: +=), d 
that 
(-1)4lo(A) 1 
b) de m™(A) md 


2.1.6 Fermat’s and Euler’s theorems 


We start with a potentially surprising observation. Namely we consider 
integers a not divisible by 7 and consider powers a°, reduced modulo 
7. Note that we may, by the division algorithm, write a = 7q +7, 
where since a is not divisible by 7, then 1 < r < 6. Therefore, using 
the binomial theorem, we get 


16This is Problem #2 on the 20th USA Mathematical Olympiad, April 23, 1991. It’s really not 
that hard! 
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6 


cfs (j,) care =H Giod 7). 


k=0 
This reduces matters to only six easily verifiable calculations: 


1° = 1(mod 7), 2° = 1(mod 7), 3° = 1(mod 7), 


DP = (9)? = 1med:-7), b= 2)" = Wad 7), 6 = 1)" = Aimed 7). 


In other words, for any integer a not divisible by 7, we have a?! = 
1(mod 7). 


In order to generalize the above result, we shall first make the fol- 
lowing observation, namely that if 2 and y are arbitrary integers, and 
if p is a prime number, then using exercise 9 on page 62 we get 


p 
(ety = 2 (0 jake 
=o \& 
= x? +y?(mod p). 


That is to say, for any integers x and y and any prime number p, we 
have 


(x+y)? = x? + y’(mod p) |. 


THEOREM. (Fermat’s Little Theorem) Let p be a prime number. Then 
for all integers a not divisible by p we have 


a?~' = 1(mod p). 


ProoFr. There are a number of proofs of this fact;'” perhaps the most 
straightforward is based on the Binomial Theorem together with the 


'7Tt is interesting to note that while Fermat first observed this result in a letter in 1640, the first 
known complete proof was not given until 1736 by Leonard Euler. 
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above observation. Note first that it suffices to assume that a > 1; we 
shall argue by induction on a. Note that if a = 1 the result is clearly 
valid. Next, assuming that a > 1, then by induction we may assume 
that (a — 1)? = (a—1)(mod p). From this we proceed: 


a? = ((a—1)+1)? 

(a —1)?+ 1” (by the above result) 
= a—1+1 (by induction) 

= a(mod p), 


which completes the proof. 


There is a striking generalization of Fermat’s Little Theorem, as 
follows. I won’t prove this here as the most natural proof of this is 
within the context of group theory. Anyway, recall the Euler ¢-function 
(see Exercise 16 on page 63), defined by setting 


o(n) = # of integers m, 1 <m <n which are relatively prime with n. 
This obviously says, in particular that if p is prime then ¢(p) = p — 1. 


THEOREM. (Euler’s Theorem) Let n be any positive integer. Then for 
any integer a with gcd(a,n) = 1 we have 


a?) = 1(mod n). 


Note that Euler’s Theorem obviously contains Fermat’s Little Theorem 
as a corollary. 


EXERCISES 


1. Compute the units digit of (23)%" 
2. Compute the least positive integer solution of n = 123'°9(mod 7). 


3. Compute the least positive integer solution of n = 506!’ (mod 11). 


88 


10. 
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. Let p be a prime number. The integers a and 6 are said to be 


multiplicative inverses modulo p if ab = 1(mod p). Using the 
Euclidean trick, prove that if p doesn’t divide a, then a has a 
multiplicative inverse modulo p. 


. Find the multiplicative inverse of 2 modulo 29. 
. Find the multiplicative inverse of 3 modulo 113. 


. Prove Wilson’s Theorem: 


(p — 1)! = —1(mod p), 
where p is a prime. (Hint; pair each divisor of (p — 1)! with its 
inverse modulo p; of course, this requires the result of exercise 4, 
above.) 


. The order of the integer a modulo the prime p is the least positive 


integer n such that a” = 1(mod p). Show that n|p—1. (Hint: 
show that if d = gcd(n,p — 1), then a4 = 1(mod p).) 


. As we saw from Fermat’s little theorem, if p is prime and if a 


is an integer not divisible by p, then a?~! = 1(modp). What 
about the converse? That is, suppose that n is a positive integer 
and that for every integer a relatively prime to n we have a”! = 
1(modn). Must n then be prime? Looking for a counter example 
takes some time, leading one to (almost) believe this converse. 
However, suppose that we were to find a candidate integer n and 
found that for every prime divisor p of n, that p—1|n—1. 


Show that n satisfies the above.'® 

Here’s a very surprising application of Euler’s Theorem, above.” 
Define the sequence aj, a2, ..., by setting aj = 2, ag = 2%, a3 = 
2”, .... Then for any integer n, the sequence aj, ao, ..., even- 


tually becomes constant (mod n). The proof proof proceeds by 
induction on n and can be carried out along the following lines. 


18Such an integer is called a Carmichael number, the first such being n = 561, which is why 
the converse to Fermat’s little theorem can appear true! It is known that there are, in fact, infinitely 
many Carmichael numbers, which means that there are infinitely many counter examples to the 
converse of Fermat’s little theorem. 

19T’'m indebted to my student, Nelson Zhang, for pointing out this exercise, commenting also that 
this is Problem #3 on the 1991 USA Olympiad contest. The hints given above are the result of our 
discussion. 
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(a) Since ¢(n) < n for all n, we see that the sequence ay, do, ..., 
eventually becomes constant modulo ¢(n). 


(b) Write n = 2k, where k is an odd integer. Since aj, ag, ..., 
eventually becomes constant modulo ¢(n), it also eventually 
becomes constant modulo ¢(k). 


(c) Conclude from Euler’s Theorem (87) that a1, do, ..., eventu- 
ally becomes constant modulo k. 


(d) Argue that a1, ag, ..., eventually becomes constant modulo 
2” and hence eventually becomes constant modulo n. 


2.1.7 Linear congruences 


A linear congruence is of the form az = b(mod n), where a, b, n are 
integers, n > 0, and x is regarded as unknown. In order to solve this 
equation, we would hope that a would have an inverse modulo n. In 
other would if there exists an integer a’ such that a’a = 1(mod n), then 
we can solve the above congruence by multiplying through by a’: 


x =a'b(mod n). 


Next, if a and n are relatively prime, then we can employ the Eu- 
clidean trick and write 


satin =1, 
for suitable integers s and t. But this says already that 
sa =1—tn=1(mod n), 
i.e., that a’ = s is the desired inverse of a modulo n. 
EXAMPLE. Solve the congruence 52 = 14(mod 18). 


SOLUTION. We employ the Euclidean algorithm: 
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18 = 3-5+3 
5 = 1-342 
3 eae 


Now work backwards and get 


Dele hie =i, 


This says that the inverse of 5 modulo 18 is —7. Therefore we see that 
the solution of the above is 


g =—7-14 = (-7)(—4) = 28 = 10(mod 18). 


EXERCISE 


1. Solve the linear congruences 
(a) 17x = 4(mod 56) 
(b) 262 = 7(mod 15) 
(c) 18a = 9(mod 55) 


2.1.8 Alternative number bases 


In writing positive integers, we typically write in base 10, meaning 
that the digits represent multiples of powers of 10. For instance, the 
integer 2,396 is a compact way of writing the sum 


2,396 = 6-10°+9-10'4+3-10°+2-10°. 


In a similar way, decimal numbers, such as 734.865 likewise represent 
sums of (possibly negative) powers of 10: 


734.865 ='5.10 °? +6210 7 4:82 10 +442 10° + 8-10-47 -107. 
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The coefficients are called the (decimal) digits. 


Arguably the second-most popular number base is 2, giving binary 
numbers (or binary representations of numbers). In this case the 
binary digits include only “O” and “1”. As an example, we can convert 
a binary number such as 1001101 into its equivalent decimal number 
by computing the relevant powers of 2: 


1001101 = 1-2°+0-2'41-2?+1-23+0-24+0-2°+1-2°=77. 


Another way of expressing this fact is by writing 772 = 1001101, mean- 
ing that the binary representation of the decimal number 77 is 1001101. 


EXAMPLE 1. Find the binary representation of the decimal number 
93. 

SOLUTION. First notice that the highest power of 2 less than or equal 
to 93 is 2°. Next, the highest power of two less than or equal to 93—2° is 
2*. Continuing, the highest power of 2 less than or equal to 93 — 2° — 24 
is 23. Eventually we arrive at 93 = 2°+ 24+42%+42?41, meaning that 
932 = 1011101. 


EXAMPLE 2. Find the binary representation of 11111. Note first that 
if n is the number of binary digits required, then after a moment’s 


thought one concludes that n — 1 < logy 11111 < n. Since log, 11111 = 
In11111 
aaa 13.44, we conclude that 11111 will require 14 binary digits. 


n 
That is to say, 11111 = 213+ lower powers of 2. Specifically, one shows 
that 


LUD ic OP sc i? ae sie OA ia 7, 
That is to say, 111112 = 10101101100111. 


As one would expect, there are b-ary representations for any base. 
For example a trinary representation would be a representation base 3, 
and the number n of trinary digits needed to represent m would satisfy 
n—-1<loggm<n. 
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EXAMPLE 3. The representation of 11111 in trinary would require 9 
trinary digits since log; 11111 * 8.48. Specifically, 

PUG, BPD abe ek BaD, 
which says that 111113 = 120020112. 


In computer science numbers are sometimes representation in hex- 
adecimal notation (base 16); the “digits” used are 0, 1, 2, 3, 4, 5, 6, 7, 
3; 9, A, B, C, D, E, F. Therefore 1716 = La. 1546 = F, 20616 = OE 
EXERCISES 

1. Compute representations of 1435 

(a) in binary; 
(b 
(c 


(d) in hexadecimal 


in ternary; 


) 
) in quarternary (4-ary) 
) 


2. Compute representations of 10,000 
(a) in binary; 
(b) in ternary; 
(c) in quarternary (4-ary) 
(d) in hexadecimal 


3. The largest known Mersenne prime”? is the number 24°:117,609 — 1, 


Compute the number of decimal digits needed to represent this 
huge prime number. Compute the number of binary digits (trivial) 
and the number of ternary digits needed for its representation. 


4. Here’s a bit of a challenge. Represent the decimal .1 (= ax) in 


binary. What makes this a bit of a challenge is that in binary, the 
decimal representation is an infinite repeating decimal (or should 
I say “bi-cimal”?). As a hint, note that 102 = 1010. Now doa 
long division into 1.7! 


20 As of August, 2008; this is a prime of the form 2? — 1, where p is prime. 
21The answer is .0001100. 
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2.1.9 Linear recurrence relations 


Many, if not most reasonably serious students have heard of the Fi- 
bonacci sequence” the first few terms of which are 


fh SSDs VO ISAO, BAe: 


Even one who hasn’t had much exposure to mathematics can easily 
guess the successive term in the sequence given the previous two terms 
as it is clear that this term is the sum of the previous two terms. Put 
more mathematically, if wu, denotes the n-th term, then one has the 
linear difference equation 


Ujpro = Up ae i Ne ent 5, gil Ue 


More elementary sequences come from the familiar arithmetic and 
geometric sequences. Arithmetic sequences are generated by differ- 
ence equations of the form uns, = Un +d, n = 0,1,2,..., where d 
is a constant. Geometric sequences come from the difference equation 
Unt = kun, n = 0,1,2,.... The general term for the arithmetic and 
geometric sequences can be easily solved for in terms of uo: 


Arithmetic: Uni, = Un +d, n= 1,2,... = > Un = up + nd. 


Geometric: Uni, = kun, n= 1,2,... => Un = ku. 


The above three difference equations are linear in the sense that 
none of the unknown terms u,, occur to powers other than 1. A very fa- 
mous nonlinear recurrence relation is the so-called logistic recurrence 
equation (or “Logistic map”), given by a relation of the form 


ine AL = Un ing = OL Disa 


?2which made a cameo appearance in the movie, The Da Vinci Code. 
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For certain values of k, the above sequence can exhibit some very 
strange—even chaotic—behavior! 


The general homogeneous linear difference equation of order 
k; has the form 


hie ie OU ep ip 20 ee i, TOD De 


Of fundamental importance is the associated characteristic polyno- 
mial 


Cae Pie tigen ae 


The charasteristic equation finds the zeros of the characteristic 


polynomial: 


k k-2 


a* — aya® — age —----—a,=0. 
Given the monic”’ polynomial 
COG) =a aa Sa Se Say, 


with real coefficients, and if u = (u,) is a sequence, we shall denote by 
C(u) the sequence u’ = (u/,)ns9 where 


/ 
Un = Untk — 41Un+k—-1 — G2Q2Un+p—-2 — °° * — AZUn. 
Therefore, the task of solving a linear difference equation is to solve 
C(u) =v, 


where v = (Un)n>0 is a given sequence. If v = O (the sequence all of 
whose terms are 0) we call the difference equation homogeneous. We 
shall be primarily concerned with homogeneous difference equations; 


23“Monic” simply means that the leading coefficient is 1. 
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note, however that the difference equations leading to arithmetic se- 
quences (Un41 — Un = d, n = 0,1,2,...) are not homogeneous. We’ll 
treat generalizations of the arithmetic sequences in Section 2.1.9, below. 


We shall now separate the homogeneous and inhomogeneous cases:7+ 


Homogeneous difference equations 


We shall consider a few commonly-occuring cases. 


Linear. Given the monic polynomial C(#) we are trying to solve 
C(u) = 0 for the unknown sequence u = (uo, U1, U2,...). AS 
sume that the polynomial is linear: C(x) = x — k, for some real 
constant k; thus the difference equation assumes the form 


es: hie Sse es (2:2) 


This says that each successive term is k times the preceding term; 
this is the definition of a geometric sequence with ratio k. 
Clearly, then the solution is 


b= RA, aL (2.3) 


where A is an arbitrary constant. The solution given in equation 
(2.3) above is called the general solution of the first-order dif- 
ference equation (2.2). The particular solution is then obtained 
by specifying a particular value for A. 


?4The reader having studied some linear differential equations will note an obvious parallel! 
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Quadratic—distinct factors over the reals. Next, assume that our 
polynomial C(x) is quadratic; C(x) = x? — ax — b, where a, b € 
IR. Thus, we are trying to solve the second-order homogeneous 
difference equation 


tigas = Cpa bug, = OD (2.4) 


Assume furthermore that C(x) factors into two distinct real linear 
factors: 
C(x) = (x = ky) (a = ka), ky =e ko ER. 


In this case it turns out that we both u, = kiAi, n = 0,1,2,... 
and un = ky Ag, n = 0,1,2,..., where Aj, Az € R are both so- 
lutions of (2.4). This is verified by direct substitution: if u, = 
ki Ay, n=0,1,2,..., then 

Un+2 — AUns1 — btm, = kit? A, — aktt" A, — bkTA, 
kit Ay (k? — ak, — b) 
= ky Ai(ki — k1)(ki — ke) = 0. 


This proves that un, = kf Ai, n =0,1,2,...is asolution. Likewise, 
Un, = kyAo, n = 0,1,2,... is another solution. However, what 
might seem surprising is that the sum 


Un = kPA, + kA, n=0,1,2,... (2.5) 


of these solutions is also a solution of (2.4). Again, this is proved 
by a direct substitution: 


Unt2 — GUnt1— bUn 
= RIA, + BBY Ay — a(R Ay + kBt1 Ay) — b(K? Ay + bRAQ) 
= R24, — aketlAy — be Ay + RB Ay — at Ay — DAB A 
= kA, (k2 — ak — b) + kB Aa(k2 — ake —b) 
= KPA, (ky — k1)(k1 — ke) + RP Aa(ka — bi) (ko — ke) = 0 +0 =0. 
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Finally, one can show that any solution of (2.4) is of the form given 
in (2.5). We won’t belabor these details any further. 


EXAMPLE 1. Solve the second-order linear homogeneous difference 
equation 


ig Se De SOA 2 


given that up = 0 and uw, = 1. 


SOLUTION. Note first that writing down the first few terms of the 
sequence is easy: 


U = uyt2u=1+0=1 
ug = Ut2uy=14+2=3 
Ua = Ugt+2u9=34+2=5 
U; = Ugt+2uzg=5+6=11 


and so on. In other words, the first few terms of the sequence look 
like 


tin = 0,1, 1,3, 5,1, .... 


What we're trying to find, however, is a recipe for the general term. 
Since the characteristic polynomial of this difference equation is 
C(x) = 27-2 -—2=(x+1)(x—2), we conclude by equation (2.5) 
that the solution must look like 


Un = A,2" + Ao(—-1)", n=0,1,2,... 


where A; and A» are constants. However, since up = 0 and u; = 1 
we obtain 
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a Ug = A + Ag(—1)° = Ay + A» 
=S US Aj?" + As(=1) = 2A, = A» 


all of which implies that A, = 7 Ag = —i. The particular solution 
of the above linear difference equation is therefore 


Quadratic—repeated factor over the reals. 


Here we assume that our polynomial C(x) is quadratic with a mul- 
tiple factor: C(x) = x? — 2kx —k? = (x—k)?, where k € R. As in 
the above case, one solution has the form u, = Ak” n = 0,1,2,.... 
However, a second solution has the form u, = Bnk", n =0,1,2,.... 
We check this by direct substitution: 


Un+2 — 2kun+1 =F hei 
= Bk"'*?((n+2)—2(n+1)+n)=0. 


Likewise, one than then show that the sum of these solutions is a 
solution of the second-order homogeneous difference equation: 


tg = ALY? Bak” = 0; Vy Ahi 


Quadratic—irreducible. In this case we consider the second-order lin- 
ear homogeneous difference equation whose characteristic equation 
is irreducible (over the reals). Thus the discriminant of the charac- 
teristic polynomial is negative (and has complex conjugate zeros). 
A simple example of such would be the difference equation 


B(n + 2)k"*? — 2kB(n + 1)k"*! + nBk7k” 
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Up SS Satis Oe Doren 


since the characteristic polynomial C(x) = 2?+2+1 is irreducible 
over the real numbers. 


Assume now that we have the second-order homogeneous linear 
difference equation (2.4) has characteristic polynomial with two 
complex zeros a+ bi and a — bi, where a, b € R, and b #4 0. Using 
the same argument in as in the previous section, we may conclude 
that a complex solution of (2.4) is 


ti AO) SO 2 eg 


where A is any real constant. However, since the coefficients in 
the equation (2.4) are real one may conclude that the real and 
imaginary parts of the above complex solution are also solutions. 
Therefore, we would like to find the real and imaginary parts of 
the powers (a + bi)", n > 0. To do this we write the complex 
number a+ bi in trigonometric form. We start by writing 


a bi 
+bi= Var+P ( ). 
a oe “ Var sb? a/at + 


Next let 6 be the angle represented below: 


wl 


a 


Therefore, a+ bi = cos@+isin 0, from which one concludes”? that 


(a+ bi)” = (cos@+isin0)” = cosné + isin né. 


25This is usually called DeMoivre’s Theorem, and can be proved by a repeated application of the 
addition formulas for sine and cosine. 
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That is to say, the real and imaginary parts of (a + bi)" are cos né 
and sinn@, where @ is as above. From this, one finally concludes 
that the solution of (2.4) in the present case has the form 


Un = Acosnd+ Bsinné, n=0,1,2,..., 
where A and B are real constants. 
It’s time to come up for air and look at an example. 
EXAMPLE 2. Solve the second-order homogeneous difference equa- 
tion 
ig) ia i TS Oy 2k 65 (2.6) 
where up = 1, wy = 1. 


SOLUTION. The characteristic polynomial C(x) = 2?+2+1 which 


S73 asf 8 
a and SS 


number in trigonometric form 


has zeros . We write the first complex 


=f 44/3 21 ae 
——— = cos 
2 3 a 


from which it follows that 


(284) Inn. . Qrn 
a aa = cos —— +2sIn ——. 


From this it follows that the general solution is given by 


2 2 
iy = Acos “+ Bsin Te =) Me De ceca 
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However, given that uw) = 0, u, = 1, we get 


0. A 
i 


2 2 A 
Awes= iL Baa ae 
3 3 2 


2 
Therefore A = 0 and B = —=, forcing the solution to be 


V3 


2 
tic asin, = Oil 2 6.e3 


Higher-degree characteristic polynomials. 


We won't treat this case systematically, except to say that upon 
factoring the polynomial into irreducible linear and quadratic fac- 
tors, then one can proceed as indicated above (see Exercise 14). 
Additional complications result with higher-order repeated factors 
which we don’t treat here. 


Higher-order differences 


In Section 2.1.9 we treated only the so-called homogeneous linear 
difference equations. An inhomogeneous linear difference equation 
has the general form 


C(u) =v, 


where C(x) is a monic polynomial, v = (Un)n>0 is a given sequence and 
where u = (Un)nso is the unknown sequence. 


We have already encountered such an example above, in the example 
on page 312 giving an arithmetic sequence: 


(PATE, lees | ei 6 (Hee 0 rl ene 


We won’t treat inhomogeneous linear difference equations in any 
detail except for a very special case, namely those having constant 
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higher-order differences. The arithmetic sequences have con- 
stant first-order differences; if d is this difference then we have uy, — 
Un = d,n=0,1,2,.... Suppose next that the second-order differences 
are constant: this means that the difference of the difference is constant, 
written as 


(Uno — tad) = Ga Sta) SH, = 0 1 eee 


In other words, 


Ups — 2Uy id Pg = dy SO, 2ins es 


Writing more compactly and in terms of the characteristic polynomial, 
we have 


C(u) =d, n=0,1,2,..., where C(x) = (x — 1)’, 
and where d is the constant sequence d, d, .... 


Constant third-order differences with constant difference d would be 
expressed as 


((Un43—Unt2) — (Unt2—Unt1))—((Unz2—-Ung1)—(Unti-Un)) = d, n = 0,1,2,... 


Les 


Un+3 — 3Unt2 + 3Un41 — Un = d, n=0,1,2,.... 


Again, a compact representation of this difference equation is 
C(u) =d, n=0,1,2,..., where C(x) = (2 —1)?. 


Continuing along these lines we see that a sequence with finite k-th 
order differences can be expressed via 


C(u) =d, n=0,1,2,..., where C(x) = (x — 1)*. (2.7) 
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Such difference equations can be solved in principle; in fact the gen- 
eral solution of (2.7) can be expressed as a polynomial. We shall sum- 
marize as a theorem below.”° 
THEOREM. A sequence ug, Uj, U2,... Is expressible as a polynomial of 
degree k in n if and only if its k-th order differences are constant. 


PrRooF. Assume that the k-th order differences of the sequence 
ug, U1, U2,... are constant. We shall prove by induction on k that uy 
is expressible as a polynomial of degree k in n. So assume that k > 1 
and that the result is valid whenever we have constant m-th order 
differences, where m < n is a positive integer. 

We set vp = Ui — Uo, V1 = U2 — U1, V2 = UZ — Ug, ..., then we have a 
sequence whose (k — 1)-st order differences are constant. By induction, 
we have a representation of the form 


k-1 k-2 
Un = bp_in + bp_on prah sy bin bo, 


for suitable constants bo, bi, bo, ..., be—1. 

Next — and this is the relatively difficult part — let a1, ag, ..., ax 
be the unique solution of the linear equations represented in matrix 
form: 


Oe 
OS De Che al 
Be 09.6 )ses GEN lea al cell pe 

Gomcale me 
Lah: OP eee Oe 


Having solved the above, one then verifies that the equation 


ap(n +1)¥ + ag_a(n +1)" 14+---+a,(n +1) 


k-1 


k k-1 k-2 
= apn” + ap_-1Nn Fees ayn bp_in + bp_-on ne alee bin + bo. 


?6 As an alternative to using the theorem, note that if a sequence u = (u,) has constant k-th 
order differences, then, in fact, u satisfies the homogeneous difference equation C(u) = 0, where 
C(a) = (x—1)*+!. One can now proceed along the lines of the “repeated factor” case, given above. 
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Finally, we set aj = uo and use the fact that ups; = Un + Vp, to check 


that 
R14, 


k 
Un = Apn” + Ap_-1N = OI ag, 
and we are finished, as we have successfully proved that the terms of 
the sequence ug, U1, :, are expressible as a polynomial of degree k. 
As for the converse, we assume that the sequence up, U1, U2,... 1s be 


expressible as a polynomial of degree k in n: 


R14, 


k 
Un = Apn” + Ap_1N = O47 + dg; 


we shall show by induction on k that the k-th order differences are 
constant. To this end, let 
Note next that the first order differences are 


Un = Un+i — Un 
(ap(n + 1)¥ + ag_a(n +1)" 1 +--+ + a) 
—(apn* ey a eee oe ao) 


= polynomial in n of degree at most k — 1. 


By induction, the sequence (v;)n>9 has constant (& — 1)-st differences. 
But then it follows immediately that the sequence (u,)ns9 must have 
constant k-th order differences, and we are done! 


EXAMPLE 3. Solve the inhomogeneous linear difference equation 
tig 5 Da te Li ONL 2 cn. gp — 25 a 4, 


SOLUTION. The difference equation says that the second-order differ- 
ences are constant and equal to 1; this implies that the sequence must 
be quadratic, say 


Un, = an? +bn+e, n=0,1,2,.... 
Note first that we can solve for the leading coefficient a by substituting 
the polynomial an? + bn + c into the above difference and noting that 
the linear terms (bn + c) have zero second-order differences and hence 
don’t contribute. This gives 
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a(n + 2)? — 2a(n + 1)? + an? = 1, 


which quickly reduces to 2a = 1, so a = 7 Next, we find b and c by 
using the initial conditions: 


ae 


1 
Pane) 
a7 +e 


| 
sa 


This quickly leads to b = 3, c = 2 and so the solution is given by 


Un = an? + 32n+2, 1h OAs ae 


EXERCISES 
1. Let (Un)n>o be an arithmetic sequence. Prove that the sequence 

(e")n>09 is a geometric sequence. 

2. Let (Un)n>o be a geometric sequence with u,, > 0 for all n. Prove 
that (log un)n>o is an arithmetic sequence. 
3. Consider the “counting sequence” 1, 2, 3,.... 

(a) Represent this sequence as the solution of an inhomogeneous 
first-order linear difference equation. 

(b) Represent this sequence as the solution of a homogeneous 
second-order linear difference equation. Find the general so- 
lution of this second-order difference equation. 

4. Solve the linear difference equation un, = —2un, n = 0,1, 2,..., 

where ug = 2 

5. Solve the second-order difference equation Unig = —4Uj41+5Un, n = 

Oy, 2,004 Where tip) = 1i;, 

6. Solve the second-order difference equation Unig = —4Un41—4Un, Nn = 


0,1,2,... where up = 1, uy = O. 
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10. 


Del, 


12; 


13. 


14. 


Lh 


CHAPTER 2 DISCRETE MATHEMATICS 


. Solve the Fibonacci difference equation Un rg = Uny, + Un, Nn = 


0,1,2,... where up = uy, = 1. 


. Let F(n), n =0, 1, 2, ... be the Fibonacci numbers. Use your re- 


sult from Exercise #7 to compute the number of digits in (1000000). 
(Hint: use log; and focus on the “dominant term.” ) 


. Consider the “generalized Fibonacci sequence,” defined by up = 


1, uy = 1, and unyg = ani, + bun, n > O; here a and 6 are 
positive real constants. 


(a) Determine the conditions on a and b so that the generalized 
Fibonacci sequence remains bounded. 


(b) Determine conditions on a and b so that u, > 0 as n > on. 


The Lucas numbers are the numbers L(n), n = 0, 1, 2, ... where 
L(0) = 2, L(1) =1, and where (just like the Fibonacci numbers) 
L(n+2) = L(n+1)4+L(n), n> 0. Solve this difference equation, 
thereby obtaining an explicit formula for the Lucas numbers. 


Let F(n), L(n), n > 0 denote the Fibonacci and Lucas numbers, 
respectively. Show that for each n > 1, L(n) = F(n+1)+F(n-1). 


Solve the second-order difference equation Unig = —4Un, n = 
0,1,2,... where uw = 1 = uy. 


Solve the second-order difference equation Un42 = 2Un44—2n, N= 
Os 1D 5 Ax 
Uo = 0, Uy = we 


Solve the third-order difference equation Uy+3 = —3Un.2 + Unsy + 
On) fy On iss, 
Ug = 1, uy = 1, ue =—1. 


Solve the inhomogeneous linear difference equation 


Cio = 2a Py = oe eS Od ee.) gy Se yy 0, 
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16. Solve the inhomogeneous linear difference equation 


Uinta ou As) A oUn th — te 2 a Oe es 
te = 0. SB 4 = 10s = 20: 
17. Given the sequence uo, U1, U2,..., note that the first few k-th order 
differences are 
first-order: Up, — Un—1 
second-order: (tn — Un—1) — (Un—1 — Un—2) = Un — 2Un—1 + Un—e2 


third-order: ((un—Un—1) — (Un—1—Un—2)) — ((Un—1 — Un—2) — (Un—2— 
Un—3)) 
= Un — 3Up—1 + 3Un+2 — Un—-3 


Find a general formula for the k-order differences and prove this 
formula. 


18. As we have seen, the sequence u, = n* has constant k-th order 
differences. Therefore, 


>: (7)(-Dn = 3 (") (—1)'(n —1)* = constant , 


i.e., is independent of n. 


(a) Conclude from this that one has the curious combinatorial 
identity: if r < k, then 


& (F)ene 20), 


(Hint: Show that for each such r the above expression is the 
coefficient of n*~" in the constant polynomial 


» (Teneo 


1=0 
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(b) Using part (a) show that 


= (7 )(-pi = (—1)*k! 


1=0 
(Hint: this can be shown using induction?’.) 


(c) Conclude that if C(x) = (x — 1)*, a the solution of C(u) = d, 
where d = d, d, ... is written as 


ee an® + lower-degree terms in n, 


then a = ah 


19. Let Fy = 1, Fy = 1, Fy = 2,... be the Fibonacci sequence. Show 
that one has the curious identity 


27Here’s how: 


E (Joa = EC) (om 


II 
| 
TM 7 
o i 
a 
> 
~ | 
an 
Na 
| 
cw, 
ere 
ll I 
° ull 
a ™~N 
3 > 
ae 
3 


> 
| 
a 
> 
| 
am 


ll ll 
| | 
3 cd 
iM iM 
3 > 
a aN 
lon > 
oth 
aN 
> 
~ | 
na 
Ny 
| 
a 
3 


3 
ll 
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(Just do the long multiplication showing that 
(eo) os Fit) =z. This says that the rational function 
k=1 


fo ee is a generating function for the Fibonacci sequence.) 
ae 


20. A sequence aj, a2, a3, ..., of real numbers is called a harmonic 
sequence if for each n > 1, an41 is the harmonic mean of a, 
and an+2 (see Exercise 9 of page 42). Show that a given sequence 
a1, @2, ... is a harmonic sequence if and only if all a; 4 0 and the 


sequence of reciprocals —, —, — is an arithmetic sequence. 
a, a2 a3 


2.2 Elementary Graph Theory 


In this section we shall consider one of the most important topics in 
contemporary discrete mathematics—that of a graph. This concept 
has a huge variety of applications and has become especially important 
to the relatively new discipline of management science. 


Mathematically, a graph is easy enough to define. It consists of a set 
V of vertices and a numerical relationship between pairs of vertices 
(sort of a “distance” or “cost” function). Namely, between any two 
vertices v; and v; is a non-negative real number c;; such that it is 
always true that cj; = cj. If ¢; A 0 we call {v;,v;} an edge. Put 
intuitively, the cost of getting from vertex vu; to v; is the same as the 
cost of getting from vertex v; to v;. In other words, the matrix C = |[c;;| 
is a symmetric matriz, called the adjacency matrix.”> This matrix is 
called the adjacency matrix of the graph. 


If the costs cj; satisfy cj; = 0 for all indices 7, and c;; is always 0 or 
1, then we call the graph a simple graph; otherwise we call the graph 
a weighted graph. Perhaps the pictures below will clarify this. 


8If this matrix isn’t symmetric, then the graph is called a directed graph; we’ll study those 
briefly in Subsection 2.2.3. 
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Simple graphs 3 


1 
2 4 1.5 - 


Non-Simple graphs 

Other definitions are as follows. An edge is called a loop if it joins 
a vertex to itself (see the above figure). Let vu; and v,; be vertices in a 
graph. We say that v; and v; are adjacent if there is an edge joining 
v; and v; (that is if the cost c;; > 0). Also, 


A walk in a graph is a sequence of linked edges . 


A trail in a graph is a sequence of linked edges such that no edge 
appears more than one. 


A path in a graph is a walk with no repeated vertices. 
A circuit in a graph is a trail that begins and ends at the same vertex. 


A cycle in a graph is a path which begins and ends at the same vertex. 


If any two vertices of a graph can be joined by a path, then the 
graph is called connected. 


2.2.1 Eulerian trails and circuits 


Suppose that a postman is charged with delivering mail to residences 
in a given town. In order to accomplish this in an efficient manner he 
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would ideally choose a route that would allow him to avoid walking 
the same street twice. Thus, if the town is represented by a simple 
graph whose edges represent the streets, then the problem is clearly 
that of finding a trail in the graph which includes every edge: such a 
trail is called an Eulerian trail. If the postman is to begin and end at 
the same vertex, then what is sought is an Eulerian circuit. General 
problems such as this are called routing problems. 


CLASSIC EXAMPLE. In the ancient _Z 
city of Konigsberg (Germany) there AX 
were seven bridges, arranged in a 
“network” as depicted in the figure 


below: 


Ficurs 98. Geographic Map: 
The Kénigsberg Bridges. 


A prize was offered to anyone who could determine a route by which 
each of the bridges can be traversed once and then return to the starting 
point. 


A casual inspection of the above lay- 
out of bridges shows that this can be rep- i) 
resented by a graph having four vertices 
and seven edges, as in the graph to the 
right. 


From the above, we see that the ad- 
jacency matrix for the seven bridges of B 
Konigsberg with labeling A = 1, B = A 
2, C =3, and D = 4 is given by 
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DEFINITION. The degree of a vertex v in a graph is the number of 
edges on this vertex. A loop on a vertex is counted twice in computing 
the degree of a vertex. 


Notice that if we are given the adjacency matrix, then the sum of 
the elements of the 7-th row is the degree of the vertex 7. 


THEOREM. Let G be a finite graph with adjacency matrix A. Then the 
number of walks of length 2 from vertex vu; to vertex v; is the (i, 7) entry 
of A*. More generally, the number of walks of length k from vertex v; 
to vertex vu, is the (7,7) entry of A*. 


A moment’s thought is also enough to be convinced of the following 
theorem: 


THEOREM. (Euler’s Theorem) Let G be a graph. 


(i) If the graph has any vertices of odd degree, then G cannot contain 
an Eulerian circuit. 


(ii) If the graph has more than two vertices of odd degree, then G 
cannot contain an Eulerian trail. 


As a result of Euler’s theorem, we see that the bridges of Konigsberg 
problem has no solution! 


EXAMPLE 1. The picture to the right 

depicts a graph G below with exactly two b) Se E 
vertices of odd degree, one at vertex A 

and one at vertex B. The reader should 

have no difficulty in concluding that G 

has no Eulerian circuits but does have an 

Eulerian trail from A to B (or from B to A i> Ee B 
A). 


Notice that if we add the degrees of all the vertices in a graph, then 
every edge get counted twice; this already proves the following. 


SECTION 2.2 ELEMENTARY GRAPH THEORY 113 


THEOREM. (Euler’s Degree Theorem) The sum of the degrees of the 
vertices equals twice the number of edges in the graph. 


As a result, one has 


COROLLARY. The number of vertices of odd degree must be an even 
number. 


The above results are negative in the sense that they tell us when 
it’s impossible to construct Eulerian circuits or Eulerian trails. We 
shall give an algorithm which allows us to find an Eulerian circuit in a 
graph all of whose degrees are even. 


Fleury’s algorithm for finding an Eulerian circuit 


Assume that we are given the graph G all of whose vertex degrees are 
even. In tracing a trail in G, after having traveled along an edge E, we 
shall remove this edge (we have “burned our bridges behind us”). 


Step 1. Pick a vertex X. 


Step 2. Move from X to an adjacent vertex Y along the edge E 
unless removing FE’ disconnectes the graph. (There may be several 
choices. Also, if there is only one choice, you need to take this 
choice!) 


Step n. Return finally to X. 


The above algorithm is depicted in the following sequence. The 
dotted edges represent the removed edges. 


CHAPTER 2 DISCRETE MATHEMATICS 


BPP. 


X x x x 


POIVT VL 


x xX xX 


Nol 


Done! An Eulerian 
circuit has been found. 


EXERCISES 


1. Sketch a graph whose adjacency matrix is 


sO 12101] 
Lo 01.2 0 
26. > 0 it i 
oS 1i009 2 
021201 
tO LoD 


How many paths of length 2 are there from vertex v2 to vertex v4? 


2. The following floor plan shows the ground level of a new home. Is it 
possible to enter the house through the front door and exit through 
the rear door, going through each internal doorway exactly once? 
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Model this problem with a suitable graph and give a reason for 
your answer. 


Front door 


y 


— 


4 


Rear door 


3. Consider the graph G having adjacency matrix 


Ott 1d 
L 0.0.2.1 
A= |)10021 
1220 1 
11110 


(a) Draw the graph. 
(b) Explain why G has an Eulerian circuit. 
(c) Find one. 


4. The map to the right illustrates a 


portion of a postal carrier’s delivery ss ; 
route. The dots indicate mailboxes 
into which mail must be delivered. . vs _ 


Find a suitable graph to represent 

the carrier’s route. Is there an Eu- oe nee 
lerian circuit? Is there an Eulerian 

trail? 
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5. We consider the following family of simple graphs O,, n = 1,2,..., 
defined as follows. 


Vertices: The set of vertices is the set {+1,+2,...,+n}. 


Edges: The vertex 7 is adjacent to the vertex 7 precisely when 


lt] A Ul. 


(a) Draw the graphs O;, O2, O3. 
(b) What is the degree of every vertex in O,,? 


(c) Is there an Eulerian circuit in O,, n > 1? 


6. We consider the family of graphs C,, n = 1,2,... defined as fol- 
lows. 
Vertices: The set of vertices is the set of binary sequences 
v = (€1, €2,---,€n), where each e; = 0 or 1. 


Edges: The vertex v is adjacent to the vertex w precisely when 
the binary sequences defining v and w differ in exactly one 


place. 

0,0,0 

C3 ° 00.00) 

(1,0,0) (0,1,0) (0,0,1) 
The graph C3 is indicated to the right. 
: (1,0,1) °€0,1,1) 
(1,1,0) 

(1,1,1) 


(a) What is the degree of each vertex in C,, n > 1? 


(b) How many paths are there from the vertex (0,0,...,0) to the 
vertex (1,1,...,1)? 
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7. Does the graph to the right have an 
Eulerian circuit? If so, find it. 


2.2.2 Hamiltonian cycles and optimization 


In the previous subsection we were largely concerned with the problem 
of moving around a graph in such a way that each edge is traversed 
exactly once. The present subsection is concerned with the “dual” 
problem, namely that of moving around a graph in such a way that 
each vertex is visited exactly once. Such a walk is called a Hamil- 
tonian path. If we return to the original vertex, the walk is called 
a Hamiltonian cycle. The following figure depicts a graph and a 
Hamiltonian cycle in the graph: 


A A 


Hamiltonian cycle from A 


Curiously, unlike the question of the existence of Eulerian circuits, 
there is no definitive simple criterion for the existence of Hamiltonian 
cycles. Known results typically involve a lower bound on the degree of 
each vertex.?? See the exercises below for a few additional examples. 


?°For example, a 1952 theorem of Paul Dirac says that a graph with n vertices has a Hamiltonian 
cycle provided that each vertex has degree > n/2. Qystein Ore generalized this result to graphs 
(with n > 3 vertices) such that for each pair of non-adjacent vertices the sum of their degrees is > n. 
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Of more significance than just finding a Hamiltonian cycle in a simple 
graph is that of finding a Hamiltonian cycle of least total weight 
in a weighted graph. Such is the nature of the traveling salesman 
problem. We start with a simple example. 


Chicago 
$133 
EXAMPLE 1. A salesman needs __ A 
to visit five cities in the American aoler 
Midwest: Chicago, Gary, Joliet, $200 
Merriville, and Highland. The cost —gigg a sien 
of travel between the cities is de- 
picted in the graph to the right.°? | $121 
Highland = $174 
Merriville 


We display the costs in tabular form. It will be convenient to use 
the letters A, B, C, D, and E to represent the cities. Notice that since 
the matrix of entries is symmetric, there is no need to fill in all of the 
entries. 


. Gary 


A = Chicago | B = Gary | C = Merriville | D = Highland | EF = Joliet 
Chicago . $185 $119 $152 $133 
Gary # $121 $150 $200 
Merriville - $174 $120 
Highland . $199 
Joliet . 


Assuming that the salesman will begin and end his trip in Chicago, 
what it the optimal, i.e., cheapest route for him to take? That is, 
which Hamilton cycle will afford the least total weight? 

In order to answer this question, a few observations are in order. 
First of all, a complete graph is one in which every pair of distinct 
vertices are joined by an edge. Thus, the above graph is a (weighted) 
complete graph. Next, it is obvious that in a complete graph with n 
vertices, there are exactly (n — 1)! Hamiltonian cycles starting from a 
given vertex. In the present example there are 4! = 24 Hamiltonian 


30The numbers are taken from Example 2, page 201 of Excursions in Modern Mathematics, Fourth 
Edition, by Peter Tannenbaum and Robert Arnold. 
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cycles. 

In order to find the Hamiltonian cycle of minimal weight, we shall 
resort to the Brute-Force Method, that is we shall form a complete 
list of the Hamiltonian cycles and their weights, choosing the one of 
minimal weight as the solution of our problem. There is one final sim- 


plification, namely, if the complete graph with vertices {v1, vo, ..., Un} 
is weighted, then the weight of the Hamiltonian cycle (vj, v2,..., Un, U1) 
clearly has the same weight as the “reverse cycle” (1, Un, Un—1,---, U2; U1): 


Therefore the Brute Force Method will require us to compare the weights 
of 4(n — 1)! Hamiltonian cycles. 

We now list the weights of the Hamiltonian cycles in the above graph, 
highlighting the cycle of minimal weight. 


cycle weight reverse cycle 
ABCDEA | 185+ 121+ 1744 199 + 133 = $812 | AEDCBA 
ABCEDA | 185+ 121+ 1204 199 + 152 = $777 | ADECBA 
ABDCEA | 185+ 150+ 174+ 120 4+ 133 = $762 | AECDBA 
ABDECA | 185+ 150+ 199 + 120 + 119 = $773 | ACEDBA 
ABECDA | 185 + 200 + 120 4+ 174 4+ 152 = $831 | ADCEBA 
ABEDCA | 185 + 200+ 199 + 174+ 119 = $877 | ACDEBA 
ACBDEA | 119+ 121+ 1504 199 + 133 = $722 | AEDBCA 
ACBEDA | 119+ 121+ 2004+ 199 + 152 = $791 | ADEBCA 
ADBCEA | 152+ 150 + 1214 120 4 133 = $676 | AECBDA 
ADBECA | 152+ 150 + 200 + 1204+ 119 = $741 | ACEBDA 
AEBCDA | 133 + 200+ 121+ 174+ 152 = $780 | ADCBEA 
AEBDCA | 133+ 200+ 150+ 174+ 119 = $776 | ACDBEA 


As a result of the above computations we see that the minimal cost is 
for the salesman to visit the cities in the order 


Chicago —> Highland —+ Gary —> Merriville —> Joliet —> Chicago, 


which results in a total cost of $676. In the next subsection we shall con- 
sider a few algorithms which can be used to determine “good” Hamil- 
tonian cycles if not the optimal Hamiltonian cycle. 


The above is an example of the Traveling Salesman Problem— 
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often abbreviated TSP—and is one of fundamental importance in Man- 
agement Science. It is also related to the so-called P = NP problem 
(one of the Millennium problems)*! in that a general good (i.e., effi- 
cient) solution of TSP would in fact prove that P = NP. 


EXERCISES 


1. Two of the three graphs below have a Hamiltonian cycle. Deter- 
mine which two and in each case find a Hamiltonian cycle. 


(a) eo (c) 


2. Find a Hamiltonian cycle of mini- 
mal weight in the graph to the right. 


430 


3. Let G be a complete graph having six vertices. Suppose that we 
label each edge with either a 0 or a 1. Prove that in this graph 
there must exist either 


(a) three vertices among whose edges are all labeled “0,” or 


(b) three vertices among whose edges are all labeled “1.” 


31See www.claymath.org/millinnium. 

32This is an elementary example of “Ramsey Theory.” In general, the Ramsey number of a 
complete graph with n vertices is the maximum number & such an arbitrary labeling of the edges 
(with Os and1s) of the graph will result in a subgraph with k vertices having all the edge labels 0 or 
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TSP: The nearest-neighbor algorithm 


As indicated above, the brute-force method will always find the optimal 
solution, but the amount of computing time required may be astronom- 
ical (which is hardly optimal!). In this and the following sections we 
shall consider two very simple algorithms which don’t necessarily find 
the optimal solution but they are known to quickly find “good” solu- 
tions. 

The Nearest-Neighbor algorithm starts with a vertex in the 
weighted graph, and then proceeds to move to the “nearest neighbor” 
without prematurely returning to a previous vertex. 


EXAMPLE. In attempting to construct the cheapest route starting from 
and returning to Chicago, we proceed as follows 


1. Move from Chicago to Merriville; the cost of $119 is the cheapest 
among all costs involving travel from Chicago. 


2. Move from Merriville to Joliet $120; this is the cheapest cost (other 
than $119, which puts us back in Chicago). 


3. Move from Joliet to Highland at a cost of $199. 
4. Move from Highland to Gary at a cost of $150. 


5. Return to Chicago at a cost of $185. 
The total cost of the above Hamiltonian route is $773, which while not 
optimal was an easy route to obtain. 
EXERCISES 


1. Consider the weighted graph with vertices A, B, C, D, and E, 
having weights assigned as follows 


all the edge labels 1. The Ramsey number of the complete graph with six vertices is 3. In fact, one 
way the above problem is often described is as follows: 


Show that among six people 
there must be either three mutual 
friends or three mutual strangers. 


122 


A|B\|C/|D/E 
A | *| 20] 23] 19] 17 
B * | 20 | 25 | 18 
C * | 24 | 19 
D sill 
E 


CHAPTER 2 DISCRETE MATHEMATICS 


Use the Nearest-Neighbor algo- 
rithm to find a Hamiltonian cycle 
starting at vertex A. What is the 
total weight of this Hamiltonian cy- 
cle? 


2. Use the Nearest-Neighbor algorithm to find a Hamiltonian cycle 
starting at vertex A. What is the resulting total weight of this 


cycle? 

A|B|C|D\|\E|F 
A| *|4.7|5.1 | 3.6] 1.1] 0.8 
B 6. [Sie |e 1 | Ore 
C *) 8.1/5.9] 5.6 
D glee ales all 
E Plo 
F * 


3. There is a variation of the Nearest-Neighbor Algorithm which in- 
creases the computation time by a factor of the number of ver- 
tices of the weighted graph. This might seem stiff, but this added 
time pales by comparison with the time required to carry out the 
Brute-Force method. Namely, for each vertex of the weighted 
graph compute the Hamiltonian cycle constructed by the Nearest- 
Neighbor Algorithm, and then take the Hamiltonian cycle of least 
total weight. This is called the Repetitive Nearest-Neighbor 
algorithm. Do this for the above weighted graph consisting of 
travel among the given five Midwestern cities. 


TSP: The cheapest-link algorithm 


There is a alternative algorithm—the Cheapest-Link algorithm which 
efficiently computes a relatively cheap Hamiltonian cycle in a weighted 
graph. This is easy to describe, as follows. 

In the weighted graph start by choosing the edge of minimal weight 
(the “cheapest link”). Next choose the next cheapest link, and so on. 
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As with the Nearest-Neighbor algorithm, we do not select any edges 
which would prematurely result in a cycle. Also, we need to avoid any 
edges which will result in more than two edges from a given vertex. 


EXAMPLE. We consider this algorithm on the Midwestern Cities graph. 


1. Choose the {Chicago, Merriville} link as this is the cheapest among 
all links. 


2. Choose the {Merriville, Joliet} link; this is the second cheapest at 
$120. 


3. The third cheapest link is the {Gary, Merriville} link at $121; 
however, choosing this link will result in three edges issuing from 
Merriville. The fourth cheapest link is the {Chicago, Joliet} link 
at $133. However, this is also impossible as a premature cycle is 
formed. We settle for the {Gary, Highland} link at $150. 


4. We choose the {Chicago, Highland} link at $152. 


5. The only remaining choice given the constraints is the {Gary, 
Joliet} link at $200. 


The above algorithm produces the Hamiltonian cycle 


Chicago —> Merriville —> Joliet —> Gary —> Highland —> Chicago, 


at a total (non-optimal) cost of $741. 


The algorithm above are what are called greedy algorithms as at 
each stage they seek the optimal (i.e., cheapest) choice. 


EXERCISES 


1. Apply the Cheapest-Link algorithm to the graph indicated in the 
table in Exercise 1 on page 121. 


2. Apply the Cheapest-Link algorithm to the graph indicated in the 
table in Exercise 2 on page 122. 
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2.2.3. Networks and spanning trees 


In this subsection we consider a problem similar to TSP but different 
in the sense that efficient and optimal solutions are possible. The basic 
idea is this: suppose, for example, that we have the weighted graph 


Cc 


C1 C2 


A = B 


We need for these three points to be “networked,” i.e., in communi- 
cation with each other, but without any redundancy. In other words, 
we don’t need all three of the edges in the above graph because if A 
is networked with B, and B is networked with C' then A is networked 
with C: there is a “transitivity” of networking. Therefore, the above 
idealized networking problem would be optimized by discarding the re- 
dundant (and most expensive) edge so that the sum of the remaining 
edge weights becomes a minimum. 


Let us flesh out the above with a somewhat more detailed problem. 


A 
3.1 
EXAMPLE 1. Assume that we have 3.0 2.5 sp .B 
cities A, B, C, D, and E and that ra : 
they can be networked according to E. a 
the costs depicted in the weighted _ 2.6 
graph to the right. 24 19 3.3 
. 18 C 


D 


What we are looking for is a network which will result in the the cities 
being interconnected but without any redundancy. Also, we are looking 
to do this with the least possible cost. The first condition simply states 
that we are looking for a “subgraph” of the above graph containing all 
of the vertices but without having any cycles in it. Such is called a 
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spanning tree of the graph.*? The second says that we are looking 
for a minimal-weight spanning tree. 


Before continuing with the above example, a few comments are in 
order. First of all, a given graph is called a tree if it is connected, has 
no multiple edges, and contains no cycles. Therefore, in particular, a 
tree is a simple graph. We shall prove a couple of results about trees. 


LEMMA. Let G be a tree, and let E be an edge in G. Then the removal 
of EF’ results in a disconnected graph. 


PROOF. Let E be on the vertices v and w. If the removal of E’ doesn’t 
disconnect G then there is a path from v to w without using the edge 
E. Since we can get from v to w via E, we clearly have a cycle in the 
graph G. Therefore, the removal of EF must result in disconnecting G. 


THEOREM. Let G be a finite simple connected graph containing n 
vertices. Then G is a tree if and only if G has n — 1 edges. 


PROOF. Assume that G is a finite tree and fix a vertex v in G. For 
any vertex w in G denote by d(v,w) (the distance from v to w) the 
length of the shortest path from v to w. Since G only has finitely many 
vertices, there must exist a vertex v’ of maximal distance from v. 


CLAIM: v’ has only one edge on it, ie., v’ is an end in the tree G. 
Assume that d(v, vu’) = d and let 


! 
U = U9, U1, U2, ..., Ud =U 


331t is easy to see that any connected finite graph contains a spanning tree. Indeed, suppose that 
the tree T is a subgroup of the connected graph G having a maximal number of vertices. If these 
aren’t all of the vertices of G, then by the connectivity of G one of the vertices of the tree must be 
adjacent to a new vertex in G. Adding this vertex (and the corresponding edge) creates a larger 
tree inside G, a contradiction. (Even if the graph has an infinite number of vertices, there still must 
exist a spanning tree. The proof, however, uses what’s called Zorn’s Lemma and is outside the 
scope of these notes.) 
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be a path from v to v’, where each 
{vj_-1, vi} is an edge in G. Assume 
that v’ is adjacent to another vertex 
v”. If a minimal length path from 
v to v" must travel through v’, then 
v” must be of greater distance from ‘ ‘vi 
v than is uv’. This can’t happen and 
so there must be a path from v to v” 
which doesn’t pass through v’. But 
with {v’, v’} being an edge, then we 
see that it is possible to construct a 
cycle through wv’, which is a contra- 
diction. 


We now may remove the vertex v’ and the unique edge e on v from 
the graph G; what results is clearly a tree with n — 1 vertices. Using 
induction, we may conclude that this tree must have n —1—1=n—-—2 
edges. If we replace the removed vertex and edge, we arrive at the 
conclusion that G itself must have n — 1 edges. 


Conversely, assume that G is a connected graph with n vertices and 
n — | edges. 


CLAIM: The graph G must contain an end vertex v. If not then each 
vertex of G must sit on at least two edges, and so 


#edgesinG = 5 SY (#edgesonv) >n, 
vertices v 


inG 


which is a contradiction. Therefore, G must contain an end vertex v. 


We now remove the end vertex v and the single edge containing v 
from the graph G. This results in a connected graph G’ consisting of 
n—1 vertices and n— 2 edges. Again using mathematical induction we 
conclude that G’ must, in fact, be a tree. But then adding v and the 
single edge to G will certainly produce a tree, and we’re done. 
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EXAMPLE 1, CONTINUED. We 
shall return to the above exam- 
ple only long enough to indicate a 
minimal-weight spanning tree. In 
the next subsection we shall indi- 
cate an efficient method to derive 
this optimal solution. 


Minimal weight 
spanning tree 
total weight=8.7 


EXERCISES 


1. Construct (by drawing) a spanning tree in each of the graphs de- 
picted below. 


e e e Py 


(a) (b) ) dd) 


2. Can you give a simple example of a graph which has no Hamilto- 
nian cycle? 


3. Indicate a Hamiltonian cycle in the 
graph to the right (if one exists). 
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Kruskal’s algorithm 


Kruskal’s algorithm is the same in spirit as the Cheapest-Link 
algorithm for finding minimal-weight Hamiltonian cycles. However, 
the surprising difference is that whereas the Cheapest Link algorithm 
doesn’t always find the minimal-weight Hamiltonian cycle, Kruskal’s al- 
gorithm will always find the minimal-weight spanning tree in a weighted 
graph. 

The algorithm is implemented by selecting in turn the edges of min- 
imal weight—and hence is a greedy algorithm—disregarding any choice 
that creates a circuit in the graph. The algorithm ends when a spanning 
tree is obtained. 


We indicate in steps how 
the minimal-weight span- 
ning tree for the exam- 
ple on page 127 was ob- 
tained (notice that we 
couldn’t choose the edge 
with weight 2.5, as this 
would create a cycle: 
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EXERCISES 


1. Find a minimal spanning tree for 
the graph on the right. 


2. The table to the right gives a de- 
scribes a graph. An asterisk (*) in- 
dicates an edge of infinite weight. 
Use Kruskal’s algorithm to find a 
minimal-weight spanning tree. 


129 
19 S 
25 
19 23. 
26 
23 
20 
A|BIC|D|\|E|F\G 
A * 5 8 74 * *k *k 
B 1), al) || abe | ee|, 7 
C Pe a ee 
D *K *K *K 9) 
EB 3 | 1 
PF umes: 
G *K 


3. (Efficient upper and lower bounds for Hamiltonian cy- 
cles of minimal weight) In this exercise we show how to obtain 
reasonable upper and lower bounds for the minimal weight of a 
Hamiltonian cycle in a weighted graph G. 


(a) (Lower bound) Notice that if we remove an edge from a 
Hamiltonian cycle we get a spanning tree. Therefore do this: 


i. Delete a vertex v and all the edges incident with v from 
the graph, call the resulting graph G,. 


ii. Use Kruskal’s algorithm to find a minimal spanning tree 
for G,. Let the total weight of this tree be W,. 


iii. Replace the vertex v and two of the cheapest edges on v. 


Show that W,+W < total weight of a minimal-weight Hamil- 
tonian cycle, where W denotes the sum of the weights of the 
two edges found in (iii), above. Thus we have efficiently ob- 
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tained a lower bound for the total weight of a minimal-weight 
Hamiltonian cycle. 


(b) (Upper bound) Use one of the efficient methods above (Nearest- 
neighbor or cheapest-link algorithm) to find a Hamiltonian 
cycle. The weight is then an upper bound. 


Prim’s algorithm 


Like Kruskal’s algorithm, Prim’s algorithm is an efficient method 
for finding a minimal-weight spanning tree in a weighted graph. We 
describe this as follows. Assume that the given weighted graph is G. 
For convenience, we shall initially regard all of the vertices and edges 
in G as colored black. 


STEP 1. Pick an initial vertex v;. Color this vertex red. 


STEP 2. Find a vertex v2 of minimal distance (weight) to v,. Color 
the vertex v2 and the edge {v1, v2} red. 


STEP 3. Choose a new vertex v3 of minimal distance to either v; or 
v2. Color the new vertex v3 and the corresponding minimal-length 
edge red. 


STEP n. Repeated application of the above will determine a red sub- 
tree of G with vertices v1, v2, ...,Un—1. Find a black edge of mini- 
mal weight on one of the above n — 1 vertices. Color this edge and 
the new vertex v,, which it determines red. 


CONCLUSION. Continue until all vertices in G have been colored red; 
the resulting red graph is a minimal-weight spanning tree. 


EXERCISES 


1. Use Prim’s algorithm to find minimal spanning trees in the first 
two exercises on page 129. 
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2. Use Prim’s algorithm to find a minimal spanning tree in the graph 


below: 
A 23 B 31 C 54 A 
11 16 23 
31 
J: 32 “K 35 1: 
14 
Lo 21 LL 11 
vA 39 
34 # 5 
32 N 18 M 30 F 
16 12 
rt * Hf an 


3. Use the methods of Exercise 3 on page 129 to find upper and lower 
bounds on the weight of a Hamiltonian cycle in the above graph. 


Weighted directed graphs; Dijkstra’s algorithm 


In many applications of graph theory, one notices that the cost of mov- 
ing from vertex v, to the vertex v2 might be different from the cost of 
moving from v2 to the vertex v;.°4 Such a graph is called a weighted 
directed graph. Of interest in this setting is to find the minimal 
weight (cost) in getting from an initial vertex—call it vp—to some other 


vertex v.°? 


Below is depicted a weighted directed graph: 


34For example the price of a airline ticket from Shanghai to Beijing is typically (slightly) less than 
the price of a ticket from Beijing to Shanghai. 
35This is sometimes called the minimal connector problem. 
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A 74 B 12 Cc 
O 2 : O > : O 
vi v 12 74 i 
Vv 
B S E 76 F 8 J 
O — O = O : O 
32 21 
scl M42 y 33 “y "31 a 
7 
H I 
2 18 
G € < 
O 10 O Te. O, 


Of course, a weighted graph can be thought of as being directed 
where the weights are the same in both directions. 


Dijkstra’s algorithm®® constructs in a graph G a directed tree start- 
ing from the vertex vg such that the minimal-weight path from vo to 
any other vertex v can be found by moving from vg to v along this 
tree. The description of the algorithm proceeds as follows. We shall 
assume, for convenience that all directed edges are initially drawn as 
“dotted directed edges.” Also, each vertex shall initially carry a tem- 
porary label, to be replaced eventually with a permanent label (which 
will represent the minimal distance from the initial vertex vp. (Caution: 


the temporary labels can change during the algorithm!) 
0 


We’ll use the graph to the right Ore eo ae 


to illustrate Dijkstra’s algorithm. 


We now itemize the steps in Dijkstra’s algorithm. 


36There are a couple of really nice applets demonstrating Dijkstra’s algorithm: 
http://www.dgp.toronto.edu/people/JamesStewart /270/9798s/Laffra/DijkstraA pplet.html 
http://www-b2.is.tokushima-u.ac.jp/ ikeda/suuri/dijkstra/Dijkstra.shtml 
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STEP 1. Find the vertices v 
in G such that (vo,v) is a eS; ee +O eae ~O 
directed edge. Temporarily 
mark these vertices with their 7! A} 8) 


weighted distrances from vo. on onkeeieet cay tae: © 


STEP 2. Fill in the edge con- O————-®- SS a ee ~O 


necting vp to the vertex v of ; 
minimal distance from vp; the 
temporary label at v, is now ; 


a permanent label. Ox eee 


STEP 3. Find all new vertices © © oe eee =©) 


connected to v,; temporarily 
mark these vertices with their 
distances from vp through vj. ! 


STEP 4. Select a vertex v2 having 
a minimal weight label; color 
in the directed edge and make 
the label permanent. (Note 
that in the event that there is 
more than one vertex of mini- 
mal distance, the choice is ar- 
bitrary. ) 
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STEP 5. Find all new vertices 
connected to v2; mark these 
with their distances from v9 
(along solid directed edges) 
and through v2. If such a ver- 
tex already has a temporary 
label, overwrite this label if 
the distance through v2 is less 
than the existing label. (This 
is where a label can change! 
If there are no new vertices, 
go to the next step.) 


STEP 6 AND BEYOND. Choose 
the vertex having the minimal 
temporary label. Color in the 
directed edge and make the 
label permanent. Keep re- 
peating this process until all 
vertices have permanent la- 
bels; The darkened directed 
edges determine a directed 
tree through which minimal 
weight paths are determined. 


EXERCISE. 


1. Use Dijkstra’s algorithm to find a minimal-weight path from vertex 
A to vertex J in the graph on page 131. 


2.2.4 Planar graphs 


Two graphs G and G2 are isomorphic if there is a function 


f: vertices of G; —> vertices of Go 


SECTION 2.2. ELEMENTARY GRAPH 'THEORY 135 


such that {f(v1), f(wi)} is an edge of G2 exactly when {v),w } is an 
edge of G,. In other words, two graphs are isomorphic exactly when 
one is simply a redrawing of the other. A moment’s thought reveals 
that the two graphs depicted below are isomorphic. 


Assume that G; and G» are graphs and that 
f: vertices of G; —> vertices of Go 


determines an isomorphism between these graphs. If v, is a vertex of 
G1, and if ve = f(v1), it should be instantly clear that vj and v2 have 
the same degree. However, this condition isn’t sufficient; see Exercise 
1 on page 141. 


There are two important families of graphs that warrant special con- 
sideration. The first is the family of complete graphs ky, Ko, Ks, ... 
(see also page 118). The graph K,, is the simple graph (see page 109) 
having n vertices and such that every vertex is adjacent to every other 
vertex. 
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The next important family involves 
the so-called bipartite graphs. 
The simple graph G is called bi- 
partite if its set V of vertices can 
be partitioned into two disjoint sub- 
sets V = V, UV> where there are no 


edges among the vertices of Vj and : V 
there are no edges among the ver- 2 
. 1 

tices of Va. 


The complete bipartite graph k,,,, where m and n are positive 
integers, is the bipartite graph with vertices V = V, UVa, |Vi| = m and 
|V2| = n and where every vertex of V; is adjacent with every vertex of 
V2 (and vice versa). 


We turn now to the main topic of this section, that of planar 
graphs.*’ These are the graphs which can be “faithfully” drawn in 
the plane. By “faithful” we mean that the edges drawn between ver- 
tices will have no crossings in the plane. As a simple example, we 
consider below two versions of the graph of the cube: the first is how 
we usually imagine it in three-dimensional space, and the second is how 
we could draw it in the plane. 


e e 
e ° : ry. 


EXAMPLE 1. The complete graphs K,, Ko, K3, K4 are obviously pla- 
nar graphs. However, we shall see below that Ks is not planar; in fact, 
none of the complete graphs K,, n > 5 is planar. Also, the complete 
bipartite graph K33 is also not planar (try it!). (We'll prove below that 
K33 is not planar.) 


37The topic of Planar graphs falls into the general category of “topological graph theory.” 
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There are two fundamental theorems which give criteria for a graph 
to be planar. They’re relatively deep results, so we won't give proofs 
here. The first result makes use of the notion of “homeomorphism” of 
graphs. Namely, two graphs are homeomorphic if one can be obtained 
from the other simply by adding vertices along existing edges. However, 
no new edges can be added! 


Homeomorphic graphs 


THEOREM. (Kuratowski’s Theorem) A finite graph G is planar if and 
only if G has no subgraph homeomorphic to the complete graph Ks 
on five vertices or the complete bipartite graph K33. 


From Kuratowski’s theorem we can deduce that the Petersen graph is 
not planar. Indeed, the sequence below shows that the Petersen graph 
has a subgraph which is homeomorphic with the complete bipartite 
graph 133. 
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Peterson graph Asubgraph A homeomorph 
1, 3. 5 


2 4 6 
A redrawing as K3 3 


The next planarity condition is somewhat more useful but slightly 
more technical. First of all, a graph H is called a minor of the graph 
G if H is isomorphic to a graph that can be obtained by a number of 
edge contractions on a subgraph of G. Look at the so-called Petersen 
graph; it contains Ks; as a minor: 


Petersen graph Ks results by contracting edges 


THEOREM. (Wagner’s Theorem) A finite graph G is planar if and only 
if it does not have Ks or K33 as a minor. 


As a result, we see that the Petersen graph is not planar. 
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Euler’s formula and consequences 


Once a planar graph has been drawn in the plane, it not only de- 
termines vertices and edges, it also determines faces. These are the 
2-dimensional regions (exactly one of which is unbounded) which are 
bounded by the edges of the graph. The plane, together with a graph 
faithfully drawn in it is called a planar map. Thus, a planar map has 
the vertices and edges of the “embedded” graph G, it also has faces. 


f; (the infinite face) 


» 


EXAMPLE 2. We look at the cube \ . f3 
graph drawn in the plane. Notice \y a 
that there are 6 naturally defined f2 fe f, 
regions, or faces. , 


f5 
f; (the infinite face) 
EXAMPLE 3. Here is a more irreg- 
ular planar graph with the faces in- 
dicated. Also, we have computed ; f3 
fa 


#vertices — #edges + #faces = 2; , f4 . 


this is a fundamental result. ’ 
v-et+f=11-13+4=2 


If we compute #vertices — #edges + #faces for the planar map in 
Example 2 above, we also get 2. There must be something going on 


here! We start by defining the Euler characteristic of the planar map 
M by setting 


x(M) = #vertices — #edges + #faces. 


The surprising fact is that the number always produced by the above 
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is 2: 


THEOREM. (Euler’s Formula) If M be a connected planar map, then 
x(M) = 2. 


PrRooF. Let 7 be a maximal spanning tree inside G; the existence 
of T’ was proved in the footnote on page 125. Note that since 7’ has 
no cycles, there can only be one face: f = 1. Next, we know by the 
theorem on page 125 that v = e+ 1. Therefore, we know already that 
X(T) =v-—e+f=1+1=2. Next, we start adding the edges of G 
to the tree 7’, noting that each additional edge divides an existing face 
in two. Therefore the expression v — e+ f doesn’t change as e and f 
have both increased by 1, proving the result.*® 


COROLLARY. For the simple planar map M, we have e < 3v — 6. 


PROOF. We may assume that M has at least three edges, for otherwise 
the underlying graph is a tree, where the result is easy. This easily 
implies that each face—including the infinite face—will be bounded by 
at least three edges. Next, notice that an edge will bound either a 
single face or two faces. If the edge e bounds a single face, then the 
largest connected subgraph containing e and whose edges also bound a 
single face is—after a moment’s thought—seen to be a tree. Removing 
all edges of this tree and all vertices sitting on edges bounding a single 
face will result in removing the same number of vertices as edges. On 
the map M’' which remains every edge bounds exactly two faces. Also, 
the number f of faces of M’ is the same as the number of faces of 
the original map M. Let v’, e’ be the number of vertices and edges, 
respectively, of IM’. Since every face of M’ is bounded by at least three 
edges, and since every edge bounds exactly two faces of M' we infer 
that 3f < 2e’. Therefore, 


2=v—-e+f <v—e42e/3 =v —e'/3, 


38In the most general setting, the Euler characteristic of a graph is a function of where it’s 
faithfully drawn. For example, it turns out that the Euler characteristic of a graph faithfully drawn 
on the surface of a doughnut (a “torus”) is always 0. See also the footnote on page 196. 
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From which it follows that e’ < 3v’ — 6. However, e’ = e—k and 
v’ = v—k for some fixed non-negative integer k from which we infer 
that e< 3v—6. 


EXAMPLE 4. From the above result, we see immediately that the 
complete graph Ks; is not planar as it has (3) = 10 edges which is 
greater than 3v — 6 = 9. 


If we have a planar bipartite graph, then the above result can be 
strengthened: 


COROLLARY. Let M be a simple planar map with no triangles. Then 
we have e< 2v —4. 


PROOF. As in the above proof, that each edge bounds two faces and 
that each face—including the infinite face—will be bounded by at least 
four edges (there are no triangles). This implies that 4f < 2e. There- 
fore, 


2=v—e+f <v—e+e/2=v-e/2, 


and so e < 2v — 4 in this case. 


EXAMPLE 5. From the above result, we see immediately that the 
complete bipartite graph (33 is not planar. Being bipartite, it cannot 
have any triangles (see Exercise 5), furthermore, it has 9 edges which 
is greater than 2v — 4 = 8. 


EXERCISES 


1. Show that even though the degree of each vertex in both graphs 
below is 3, these graphs are not isomorphic. 
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2. Here’s a slightly more sophisticated problem. Define the graphs G, 
and Gy, as follows. Start by letting n be a fixed positive integer. 


Vertices of G,: These are the subsets of {1, 2, ..., n}. 
Edges of G;: {A, B} is an edge of G; exactly when 


[ANB] = max{|A] — 1, |B] —1}. 


(Notice that this says that either A C B and |B] = |A|+1 or 
that B C A and that |A| = |B|+ 1.) 


Vertices of G2: These are the binary sequences v = (€1, €2,...,€n), 
where each ¢; = 0 or 1. 


Edges of Gy: {v, w} is an edge of G2 precisely when the binary 
sequences defining v and w differ in exactly one place. (This 
is the graph defined in Exercise 6 on page 116.) 


Show that the graphs G and G2 are isomorphic. 


3. Assume that a graph G can be “faithfully” drawn on the surface 
of a sphere. Must this graph be planar? 


4. Consider the “grid graph,” constructed as follows. Let m and n 
be positive integers and in the coordinate plane mark the points 
having integer coordinates (k,l) such thatO <k <mand0<n< 
m. These are the vertices of the graph G. The edges in this graph 
connect the vertices separated by Euclidean distance 1. Show that 
this graph is bipartite. 
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5. 


10. 
11. 


I, 


Prove that any cycle in a bipartite graph must have even length. 
Conversely, if every cycle in a graph has even length, show that 
the graph must be bipartite. 


. How many edges are there in the complete bipartite graph Ky? 


. Let G be a finite simple graph (see page 109) of n vertices in which 


every vertex has degree k. Find a simple formula in terms for the 
number of edges in terms of n and k. 


. Let G be a planar graph. Prove that G must contain a vertex 


whose degree is at most 5. 


. Use the result of Exercise 8 to show that any planar graph can be 


6-colored. That is to say, if G is a planar graph then using only 
six colors we can color the vertices of G in such a way that no two 
adjacent vertices have the same color.?? 


Prove that none of the complete graphs K,,, n > 5 is planar. 


Let G be a planar graph and let M be the map it determines by an 
embedding in the plane. We define the dual graph G* (relative 
to the map M) as follows. The vertices of G* are the faces of M. 
Next, for each edge of G we draw an edge between the two faces 
bounded by this edge. (If this edge bounds a single face, then 
a loop is created.) Show (by drawing a picture) that even when 
every edge bounds two faces, then the dual graph might not be a 
simple graph even when G is a simple graph. 


Let G be a planar graph, embedded in the plane, resulting in the 
map M. Let G* be the dual graph relative to M. Let T bea 
spanning tree in G and consider the subgraph 7™* of G* to have all 
the vertices of G* (i.e., all the faces of IV) and to have those edges 
which corresponding to edges in G but not in 7’. 


3°Of course, the above result isn’t “best possible.” It was shown in 1976 by K. Apple and W. 
Haken that any planar map can 4-colored. For a nice online account, together with a sketch 
of a new proof (1997) by N. Robertson, D.P. Sanders, P.D. Seymour, and R. Thomas, see 
http://www.math.gatech.edu/~thomas/FC/fourcolor.html. Both of the above-mentioned proofs are 


computer aided. 
It is not too difficult to prove that a planar graph can be 5-colored; see M. Aigner and G.M. 
Ziegler, Proofs from the Book, Third Edition, Springer, 2004, pages 200-201. 


144 CHAPTER 2 DISCRETE MATHEMATICS 


(a) Show that 7* is a spanning tree in G*. 

(b) Conclude that v = ep +1 and f = er» + 1, where er is the 
number of edges in T and er« is the number of edges in 7”. 

(c) Conclude that er + er« = e (the number of edges in G). 


(d) Conclude that vu + f = (er + 1) + (er +1) =e + 2, thereby 
giving another proof of Euler’s theorem. 


Chapter 3 


Inequalities and Constrained 
Extrema 


3.1 A Representative Example 


The thrust of this chapter can probably be summarized through the 
following very simple example. Starting with the very simple observa- 
tion that for real numbers x and y, 0 < (x — y)?. Expanding the right 
hand side and rearranging gives the following inequality: 


20 ae ac y’, 


again valid for all x, y € R. Furthermore, it is clear that equality ob- 
tains precisely when x = y. We often refer to the as an unconditional 
inequality, to be contrasted from inequalities which are true only for 
certain values of the variable(s). This is of course, analogous to the 
distinction between “equations” and “identities” which students often 
encounter.! 


We can recast the above problem as follows. 


'By way of reminder, the equality x? — x — 6 = 0 admits a solution, viz., 2 = —2, 3, whereas 
the equality z(a — 2) = x? — 2 is always true (by the distributive law), and hence is an identity. 
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PROBLEM. Given that 2? + y? = 4, 
find the maximum value of 2xy. 


SOLUTION. If we are thinking in 
terms of the above-mentioned in- 
equality 2ry < 27+ y”, with equality 
if and only if x = y, then we see im- 
mediately that the maximum value of 
2ry must be x? + y? = 4. However, it 
is instructive to understand this prob- 
lem in the context of the graph to the 
right, where the “constraint curve” is 
the graph of x? + y? = 4 and we’re trying to find the largest value of 
the constant c for which the graph 2xy = c meets the constraint curve. 


From the above figure, it is clear that where 2x7y obtains its maxi- 
mum value will occur at a point where the graph is tangent to the circle 
with equation 77+ y? = 4. As a result, this suggest that the solution 
can also be obtained using the methodology of differential calculus (in- 
deed, it can!), but in this chapter we wish to stress purely algebraic 
techniques. 


We can vary the problem slightly and ask to find the maximum value 
of xy given the same constraint x? + y? = 4. However, the maximum 
of zy is clearly 1/2 the maximum of 2xy and so the maximum value of 
xy is 2, 

In an entirely similar fashion we see that the minimum value of 2xy 
given 77+ y? = 4 must be —2. This can be seen from the above figure. 
Even more elementary would be to apply the inequality 0 < (a+ y)? > 
—Qry <a? + y’. 
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As a final variation on the above 
theme, note can that can interchange 
the roles of constraint and “objective 
function” and ask for the extreme val- 
ues of x? + y? given the constraint 
xy = 2. The relevant figure is given 
to the right. Notice that there is no 
maximum of x?+y?, but that the min- 
imum value is clearly x? + y? = 4, 
again occurring at the points of tan- 
gency. 


EXERCISES. 


1. Find the maximum of the function xy given the elliptical constraint 
4x? + y? = 6. Draw the constraint graph and the “level curves” 
whose equations are xy=constant. 


2. Given that xy = —5, find the maximum value of the objective 
function x? + 3y’. 


3. Given that xy = 10, find the maximum value of the objective 
function x + y. 


4. Suppose that x and y are positive numbers with 


1 1 
x+y =1. Compute the minimum value of (1 + (1 + ). 
L Y 
3.2 Classical Unconditional Inequalities 


Until further notice, we shall assume that the quantities 71, v2, ..., Ln 
are all positive. Define 


ARITHMETIC MEAN: 


AM(a1, ®2,.--, Ln) 


GEOMETRIC MEAN: 


GM (Qin Powe g iy) = V/s Oy, 
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HARMONIC MEAN: 


FUNG (G55 aon 


QUADRATIC MEAN:? 


2 2 2 
Ya a a 
OMe topes ty) -| : e 


n 


Note that if a,, a2,... is an arithmetic sequence, then a, is the 
arithmetic mean of a,_; and aj41. Likewise if a1, a2, ... is a geometric 
sequence (and all a, > 0), then a, is the geometric mean of a,_; and 
An+1- 

A harmonic sequence is by definition the reciprocal of nonzero 
terms in an arithmetic sequence. Thus, the sequences 


dy 7; Ue ee 2D 
1,=, =,..., amd =, =, =,. 
2° 3 oe 
are harmonic sequences. In general, if a, ag, ... is a harmonic sequence, 


then a, is the harmonic mean of a,_; and dy+4. 


One of our aims in this section is to prove the classical inequalities 
HM < GM < AM < QM. 


Before doing this in general (which will require mathematical induc- 
tion), it’s instructive first to verify the above in case n = 2. 
Indeed, starting with 0 < (./x — \/y)? we expand and simplify the 
result as 
2/ey <¢+y => GM(2, 22) < AM(ai, 2). 


Having proved this, note next that 


HM(21, 22) = (am (- 4 = ; 


Ly X92 


?Sometimes called the root mean square 
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since we have already shown that GM(z,y) < AM(z,y) for x, y > 0, 
we now have 


HM(a1, 2) = (AM (- + ‘yy < (cu (- ‘yy = GM(a1, 22). 


Ly x2 ti a) 
Finally, note that since 27,72 < x27+.73 (as proved in the above section), 
(a, + 2)? = 2? 4+ 2(a 122) + 23 < 2(x? + 23). 


Divide both sides of the above inequality by 4, take square roots and 
infer that AM(a1, 2%) < QM(a1, x2). 


For a geometric argument showing HM < GM < AM, see Exercise 1, 
below. 


We turn next to proofs of the above inequalities in the general case. 


AM(a1,.--,%n) < QM(ax1,...,2%n): This is equivalent with saying that 


(Gite: + on)? 2 wit tay 


2 = FS 


n 


which is equivalent with proving that 
(tr +++ +n)" < may +--+ +2). 
By induction, we may assume that 
(e+ +++ +21)" S (n— I @jit--- +554). 


Furthermore, note that for any real numbers zx, y, we have 0 < (x—y)? = 
a ty?—Qry > 2xy < x?+y?. Armed with this, we proceed, as follows: 


(ap tosedeg,)? = (ap bee tage)? £07 sb et ey) ee 
(n —1)(@i+---+47_1) 
E(t ea) eel gs ae 


= nai+-- +2), 


I 


which proves that AM < QM. Notice that since, for any 71, 1, ..., Ln, 


e+ HQ +++ + ay < lar] + |xo) +--+ + |x, 
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then we see that AM(21,...,2n) < QM(a1,...,2n) is true without the 
assumption that all x; are positive. 


GM (e509) = AM a, casa, let OC. = w/t ea, Tall = CC, 
then 


Ly 
A/D (tok, = OS 


n 
and we’re done in this case. Therefore, we may assume that at least 


one of the x;s is less than C' and that one is greater than C’. Without 
loss of generality, assume that x, > C and that C > x». Therefore, 


(x, — C)(C — 22) > 0 and so x22 < C(#, +22) -C? > Pie 


(4) (=?) + 1. From this, we conclude 


C/AC 
Btate +o. (hits) CF aa. Se, 4 
C C 
> (n—1) "W(ait2-++¢n)/C"™ +1. (using induction) 
= G1) sala, 


That is to say, in this case we have 
Ly + Lge + Ly 
— > OC = o/aita in, 
n 
concluding the proof that GM < AM. (For a much easier proof, see 
Exercise 2 on page 160.) 


HM(a1,...,2%n) < GM(a1,...,2,): From the above we get 


zy 2 X3 


-+i+..-44 ad 
ee ; 
n XX Ln 
take reciprocals of both sides and infer that HM < GM. 


A generalization of AM < QM is embodied in the very classical 
Cauchy-Schwarz inequality. We state this as a theorem. 


THEOREM 1. (Cauchy-Schwarz Inequality) Given 
Lis Dp cs Cay Dig Vox eas Ue © Ry, One has 


(xiyi + Layo +++ + 4nYn)? < (pt epte- + ae \(yp+yZt---+y3). 
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PROOF. We define a quadratic function of x: 


Q(x) = (x, —y1)? + +++ (22 — yn)? 


= (a? 4+---4+92)e? — (ay, +--+ atnn)e + (y2 +--+ y?). 


Since Q(x) > 0, we see that the discriminant must be < 0: 


A(ayyy +++ + 2nYyn)? A(x} ae x )(yj +++ +y3) = 0; 


so we’re done! Pretty slick, eh?? See Exercise 4, below. 


EXERCISES. 


il 


The diagram to the right depicts P 
a semicircle with center O and di- 
ameter XZ. If we write XZ = Q 
a + b, identify AM(a,b), GM(a, d), 
and HM(a, b) as lengths of segments 
in the diagram. 


. Suppose that aj, dz, a3, ... is a sequence of non-negative terms 


such that for each index i > 1, a; = QM(a;-1, a;41). Show that 
G2 ;.@>. Ga: ... 1s an arithmetic sequence. 


. Show that if a, b > 0 then 


a+b _ Doar Gb ashe 
2. - 3 a+b 


. Show how AM < QM is a direct consequence of the Cauchy- 


Schwarz inequality. 


. State a necessary and sufficient condition for AM(21,...,2%,) = 


COMM igre ita): 


3The Cauchy-Schwarz inequality can be generalized to complex numbers where it reads: 


|eiy1 + ayo +--+ a¢nynl? < (leal? + |ael? +--+ + lanl? (yal? + lyal? +--+ + lynl?). 
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6. Find the maximum value of the objective function 7+ y+ z given 
that 2?+y?+2?=4. (Hint: use AM(z, y, z) < QM(z, y, z).) Can 
you describe this situation geometrically? 


7. Find the maximum value of the objective function x? + y? + 2? 
given that r+y+z=6. 
8. Suppose that x and y are positive numbers with 


i ee | 
x+y=1. Show that —+-—- > 4. 
oy 


9. Suppose that x and y are positive numbers with 
1 1 
x+y = 1. Compute the minimum value of (1 + (1 + ). (This 
L Y 


was already given as Exercise 4 on page 147. However, doesn’t it 
really belong in this section? Can you relate it to Exercise 8, 


above?) 
10. Assume that 71, v2, ..., LY» > O and that 7; +---+2, = 1. Prove 
that 
1 1 9 
tee Sy a ae 
XY In 


(Hint: don’t use mathematical induction!) 


11. Let n > 2, x, y > 0. Show that 


n—-1 
2 ay eG a Ey"): 
k=1 


(This is somewhat involved; try arguing along the following lines. 


n—-1 
(i) Let P(a,y) = (n — 1)(2" + y") — 2¥ aky"*; note that 
k= 


P(y,y) = 0 (ie, 2 = y is a zero of P(x,y) regarded as a 
polynomial in x). 


d 
(ii) Show that det oY) = 0. Why does this show that 
7 


P(x,y) has at least a double zero at 2 = y? 


(iii) Use Descartes Rule of Signs to argue that P(z,y) has, for 
x, y > 0 only a double zero at x = y. 
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(iv) Show that this implies that P(z,y) > 0 when x, y > 0 with 
equality if and only if x = y.) 


12. You are given AABC' and an in- 
terior point P with distances x to 


[BC], y to [AC] and z to [AB] as B 
indicated. Let a = BC, b = AC, 
and c= AB. 
x 
(a) Find the point P which mini- es 
mizes the objective function . u 


(Hint: note that ax + by+cz is proportional to the area of AABC. 
If need be, refer back to Exercise 5 on page 17.*) 


(b) Conclude from part (a) that the inradius r of AABC (see page 
17) is given by r = 2A/P, where A and P are the area and 
perimeter, respectively, of AABC. 


The next few exercises will introduce a geometrical notion of the mean 
of two positive numbers. To do this, fix a positive number n 4 1 (which 
need not be an integer), and draw the graph of y = x” for non-negative 
x. For positive real numbers a # b, locate the points P = P(a,a") and 
Q = Q(b,b") on the graph. Draw the tangents to the graph at these 
points; the x-coordinate of the point of intersection of these tangents 
shall be denoted S',(a,b) and can be regarded as a type of mean of a 
and b. (If a = b, set S,,(a,b) =a.) See the figure below: 


4Tt turns out that P must be the incenter of AABC. 
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Q=Q(b,b') 


(S,(a,b),0) 
ia 


12. Show that if a, b > 0, then 
(a) S_1(a,b) = HM(a, b); 
(b) Sy/2(a, b) = GM(a, d); 
(c) So(a,b) = AM(a, b). 
13. Show that 


(n — 1)(a” — 6") 


Op a,0) = matt — By (ab). 


14. Show that if 2 < m < n are integers, and if a, b > 0 are real 
numbers, then S,,(a,b) < S;,(a,6). (Hint: this can be carried out 
in a way similar to the solution of Exercise 11. 


(a) First note that 


(m—1)(a"™—b™) — (n—1)(a" — 6") 
nas _ pe) as HC ate _ pe*) 


if an only if 
nim=1)(a™—b"\(a" =" < mina =) a 7), 
(b) Next, define the polynomial 


P(a,b) = m(n—1)(a"—b")(a"™ 4B") —n(m—1)(a"—b")(a"—b"); 
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the objective is to show that P(a,b) > 0 when a,b,m,n are as 
given above. Regard the above as a polynomial in a and use 
Descartes Rule of Signs to conclude that (counting multiplic- 
ities) P(a,b) has at most four positive real zeros. 


(c) Note that a = 6b is a zero of P(a,b), ie., that P(b,b) = 0. 
Next, show that 


(d) Use the above to conclude that P(a,b) > 0 with equality if 
and only if a= b (or m=n). 


3.3. Jensen’s Inequality 


If P and Q are points in the coordinate plane, then it’s easy to see 
that the set of points of the form X = X(t) = (1-—t)P+tQ, 0< 
t < 1 is precisely the line segment joining P and Q. We shall call 
such a point a convex combination of P and Q. More generally, 
if Pj, Po, ..., P, are points in the plane, and if ft), to, ..., t, are non- 
negative real numbers satisfying t; + t2+---+t, =1, then the point 


X=X(t)=hPt+tePot+---+trP, 
is a convex combination of P;, Po, ..., P,. This set is precisely smallest 
convex polygon in the plane containing the points P,, Po, ..., P,. Such 


a polygon is depicted below: 
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ss ce inl as ea oe idlieaeipaneing Gear inc 


! 
Nh 


Next, recall from elementary differential calculus that a twice-differentiable 
function f is concave down on an interval [a,b] if f(a) < 0 for all 
x € |a,b]. Geometrically, this means that if a < c < d < b then the 
convex combination of the points P = P(c, f(c)) and Q = Q(d, f(d)) 
lies on or below the graph of y = f(x). Put more explicitly, this says 
that when a<c<d<b, and when 0 <t <1, 


f((1 — the + td) > (1—t) fle) +t f(a). 


LEMMA 1. (Jensen’s Inequality) Assume that the twice-differentiable 
function f is concave down on the interval [a,b] and assume that 


£1, 02,---, Un € [a,b]. If ty, te, ...,t, are non-negative real numbers 
with tj +tg+---+t, =1, then 
f (tiv + totg + +++ + tytn) > tif (vi) + tof (v2) +++ + tif (an). 


PROOF. We shall argue by induction on n with the result already being 
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true for n = 2. We may assume that 0 < t, < 1; set 


ty tees + bp -1Ln-1- 
1—-t, 


LQ = 


note that xo € [a,b]. We have 


f (tia + tota +--+ +trtn) = f((1—tr)to + trtn) 
> (1-tn) f(xo sae (by induction) 
> (1 tn) (Fe eal a! f(t) 
ae beh ae) ra ae 


ee + tof(t2) +--+ + thf (Zn), 


and we’re finished. 


3.4 The Holder Inequality 


Extending the notion of quadratic mean, we can define, for any real 
number p > 1 the “p-mean” of positive real numbers 2j,..., 2p: 


ay ay eee ah, 
pM(e1.22,..-.4%) =f u 2 


We shall show that if 1 < p < q that for positive real numbers 71,..., Xp, 
one has 


DIM Coss 95 pax, ) SGM Gs 5 2 a By) 


The proof is not too difficult—a useful preparatory result is Young’s 
inequality, below. 


LEMMA 2. (Young’s Inequality) Given real numbers 0 < a, b and 
i 

0 < p, gq such that — + — = 1, one has 
P 4g 


with equality if and only if a? = b. 
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PROOF. The proof involves a ge- 
ometrical fact about the graph of 
the function y = Inz, namely 
that for any two points P, Q 
on the graph, the straight line 
segment on the graph is com- 
pletely below the graph.® Thus, 
let P = P(a?,In(a?)) and Q = 
((b2, In(b’)) be two points on the 
graph of y = Inx. For any value 
of the parameter t, 0 <t < 1, the 
point X = X (tb? + (1 — t)a’,tln(b’) + (1 — t) In(a?)) is a point on the 
line seqment PQ. Since the graph of y = In lies entirely above the 
line segment PQ, we conclude that 


In(tb? + (1 — t)a?) > tIn(b%) + (1 — t) In(a?) =tgIndb+ (1 —-t)pina. 
Now set t = 1/q and infer that 


Fe 
In ( — > Inb+ Ina = In(ab). 
q Pp 


Exponentiating both sides yields the desired result, namely that 


bY @P 
—+—>ab. 
q Pp 


THEOREM 2. (Holder’s Inequality) Given real numbers 71,...,%p, Y1,---5 Yn; 


and given non-negative real numbers p and q such that —+ — = 1 then 
Pq 
nm n 1/p n 1/q 
5 feaal < (Soha?) (35 tal] 
i= 1= j= 


PROOF. Let 


5 Another way to say this is, of course, that the graph of y = nz is concave down. 
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We may assume that both A, B # 0, else the theorem is clearly 
true. Therefore by using Young’s inequality, we see that for each 
GS Deans Ma: Lae 


[x ; Lyi < |x|? | lyil? 
A B 7 pAP- qB- 


Therefore, 


as 
w& 
Mes 
3 
Ss 
IA 
iM 
ae 
arog 
3 8 
Q |e 
WIS 
SS oe 


This implies that 


. ‘a LD 7s 1/q 
Sie eaee (s jn? (3: wl | 
i=l i=l j=l 

and we’re done. 


Note that if we set all y; = 1 then we get 


n n 1/p n 1/p 
Sled snl ( Sle) nt (Sie) 
(1 i=l i=l 
and so 
1 n n 1/p 
ES Sas (s jn 
T j=1 i=l 
for any p > 1. This proves that 
AMC ar Malicess | @n|) OM ial, Doli ceag| al) 


whenever p > 1. 
Finally, assume that 0 < p < gq and assume that 21, %, ..., Zp, are 
non-negative. We shall show that 


PIC Porexs5 ta) SOM Poe wea) 
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Indeed, from the above, we have by setting r = E > 1 that 
Pp 


n 
ee = AMG 2802 th) 
i=] 


< TM(G7 255.235 58") 


ae 


(Ssceneinmn)" 03 stn) | 


i=1 } 


i=l 


Taking the p-th roots of both sides yields what we were after, viz., 


7 1/p x 1/q 
PM Gite okeSe (s wn) < = at/n) STV R era peu Meares 2 E 
<i 


i=1 


EXERCISES. 


1. Show how Young’s inequality proves that GM(ax1, 22) < AM(a1, 22), 
where 21, 2% > 0. 


2. Use Jensen’s inequality and the fact that the graph of y = Inz is 
concave down to obtain a simple proof that 


AM (95:05, 2050) 2 GME Woes con) 
where 21, %2,...,%n > 0. 


3. Use Jensen’s inequality to prove that given interior angles A, B, 
and C of a triangle then 


sinA+sinB+sinC < 3V2/2. 


Conclude that for a triangle AABC inscribed in a circle of radius 
R, the maximum perimeter occurs for an equilateral triangle. (See 
Exercise 2 on page 34.) 


4. Given AABC with area K and side lengths a, b, and c, show that 
ab +ac+ be > 4V3K. 


Under what circumstances does equality obtain? (Hint: note that 
6K = absinC + acsin B + besin A; use Cauchy-Schwarz together 
with Exercise 3, above.) 
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3.5 The Discriminant of a Quadratic 


The discriminant of a quadratic polynomial, while finding itself in 
(mostly trivial) discussions in a typical high-school Algebra II course, 
nonetheless is a highly underused and too narrowly understood concept. 
This and the next two sections will attempt to provide meaningful ap- 
plications of the discriminant, as well as put it in its proper algebraic 
perspective. Before proceeding, let me remind the reader that a possi- 
bly surprising application of the discriminant has already occurred in 
the proof of the Cauchy-Schwarz inequality (page 150). 


Given the quadratic polynomial f(x) = az? +bxr+c, a,b,c € R, the 
discriminant is defined by the familiar recipe: 


D = & —4ac. 


This expression is typically introduced as a by-product of the quadratic 
formula expressing the two roots a, 6 of the equation f(x) = 0 as 


—b+ Vb? — 4ac 7 b+ JSD 


eae 2a 2a 


From the above, the following simple trichotomy emerges. 


D>0 <=> f(z) =0 has two distinct real roots; 


D<0 <> f(z) =0 has two imaginary conjugate roots; 


D=0 <> f(x) =0 has a double real root. 


Note that if f(z) = av? + bx +c with a > 0, then the condition 
D <0 implies the unconditional inequality f(x) > 0. 
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The D = 0 case is the one we 
shall find to have many applica- 
tions, especially to constrained ex- 
trema problems. Indeed, assuming 
this to be the case, and denoting a 
as the double root of f(x) = 0, then 
we have f(x) = a(x — a)? and that 
the graph of y = f(x) appears as 
depicted to the right. 
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The geometrical implication of the double real root is that the graph 
of y = f(x) not only has an x-intercept at x = a, it is tangent to the 
x-axis at this point. We shall find this observation extremely useful, as 
it provides application to a wealth of constrained extrema problems. 


The first example is rather pedestrian, but it will serve to introduce 


the relevant methodology. 


EXAMPLE 1. Find the minimum value of the quadratic function 


f(x) = 2a? — 122 + 23. 


SOLUTION. If we denote this minimum 


by m, then the graph of y = m will 
be tangent to the graph of y = f(z) 
where this minimum occurs. This 
says that in solving the quadratic 
equation f(x) = m, there must be 
a double root, i.e., the discriminant 
of the quadratic 2x? — 12% +23 —m 
must vanish. The geometry of this 
situation is depicted to the right. 


y 
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Solving for the discriminant in terms of m, we quickly obtain 


b? — dac 

= (—12)?—4:2-(23-—m) 

= 144 — 8(23 —m) 

144 — 184+ 8m = —40+ 8m 


0 


and so one obtains the minimum value of m = 5. Of course, this is 
not the “usual” way students are taught to find the extreme values 
of a quadratic function: they use the method of “completing the 
square” (another useful technique). 


EXAMPLE 2. Here’s an ostensibly harder problem. Find the minimum 
value of the function g(x) = x+—, x > 0. Before going further, 
note that the ideas of Section 3.1 apply very naturally: from 


0<(ve-4) -2+h-2 


1 

we see immediately that «+ — > 2 with equality precisely when 
£ 

x = 1. That is the say, the minimum value of the objective function 


x+— is 2. 
x 


SOLUTION. Denoting this minimum by 
m, then the graph of y = m will 
again be tangent to the graph of 
y = g(x) where this minimum oc- 


1 
curs. Here, the equation is x + — = 


m, which quickly transforms tothe 
quadratic equation x?—max+1 = 0. 
For tangency to occur, one must 
have that the discriminant of 


x? — max +1 vanishes. 
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EXAMPLE 3. Find the equations of 
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We have 


0 = b= 4ac 
= mA 
which immediately gives m = +2. Only the value m = 2 is rele- 
vant here (as we assumed that x > 0) and this is the sought-after 
minimum value of g. (The extraneous value m = —2 can easily be 


1 
seen to be the maximum value of x + —, x < 0.) 
a 


the two straight lines which pass 
through the point P(0,5) and are 
both tangent to the graph of 


aoe 


SOLUTION. If we write a line with equation 0: y =5+mz, where the 


slope is to be determined, then we are solving 4 — x? = 5+ mz so 
that a double root occurs (i.e., tangency). Clearly, there should 
result two values of m for this to happen. Again, the discrimi- 
nant is a very good tool. Write the quadratic equation having the 
multiple root as x? ++ mz +1 =0, and so 
0 = b—4ac 
a 4 Sys 22. 


Therefore, the two lines are given by equations 


y=54+2¢ and y=5- 2z. 


(The two points of tangency are at the points with coordinates 
(41,3).) 
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EXAMPLE 4. Given that x? + 2y” = 6, find the maximum value of 
LP Ys 


SOLUTION. This problem appears quite a bit different (and more dif- 
ficult) than the preceding examples, but it’s not, and it fits in very 
well to the present discussion.© This problem is very geometrical 
in nature, as the “constraint equation” x? + 2y? = 6 is an ellipse 
and the graphs of x + y = c (c = constant) are parallel lines (with 
slope —1). We seek that value of c which gives the maximum value 
of «+ y. See the graphic below: 


D=0 
D0 parallel lines 
X+Y=C 


Clearly the maximum value of x + y will occur where this line is 
tangent to the ellipse. There will be two points of tangency, one 
in the third quadrant (where a minimum value of x + y will occur) 
and one in the first quadrant (where the maximum value of x + y 
occurs). Next, if we solve « + y = c for y and substitute this into 
x? +2y-6, then a quadratic equation in x will result. For tangency 
to occur, one must have that the discriminant is 0. From y = c—a, 
obtain 


a? 296 =a) = 6 S03 39 4 $9 = 6S 0, 


This leads to 
0=D = 16c’ — 12(2c — 6) = —8ce? + 72 => c= +38. 


6Problems of this sort are often not considered until such courses as Calculus III, where the 
method of Lagrange multipliers is applied. 
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Therefore, the maximum value of x + y is 3 (and the minimum 
value of x + y is —3). 


EXERCISES. 


d): . oh 
1. Given that r+y=1, x, y > 0, find the minimum value of — + -. 
coY 


Ls 7 ah 
2. Given that -+-—=1, 2, y > 0, prove that x+y > 4. 
GY 


(Exercises 1 and 2 can be solved very simply by multiplying to- 
i ee 

gether — + — and x+y and using the result of Example 2.) 
uo Y 


3. Find the distance from the origin to the line with equation 
x+ 3y = 6. 


4. Given that = +y = 1 find the minimum value of 


Pa Os. Ca es. 


5. Find the largest value of a so that the parabola with equation 
y =a—2? is tangent to the circle with graph x? + y? = 4. Go on 
to argue that this value of a is the maximum of the function x? +y 
given that x27 + y? = 4. 


6. Let f(x) = az? + br +c, and so the derivative is f'(x) = 2ax + b. 
Denote by R(f) the determinant 


RU) = a ; 3. 
0 2a b 


Show that R(f) = —aD(f), where D = D(f) is the discriminant 
Of (2): 
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7. Above is depicted the circle whose equation is x? + y? = r?, as 


well as the tangent line to this circle at the point R. The point 
P = P(a,0) is the intersection of this tangent line with the z-axis 
and the point Q = Q(b,0) as the same z-coordinate as the point 
R. 


(a) Using a discriminant argument show that if m is the slope of 
the tangent line, then 


at ae 
Use this to show that b = r?/a. 


(b) Using the Secant-Tangent Theorem (see page 32), give another 
proof of the fact that b = r?/a. Which is easier? 


3.6 The Discriminant of a Cubic 


The the quadratic f(x) = ax? + br +c has associated with it the 
discriminant D = b? — 4ac, which in turn elucidates the nature of the 
zeros of f(x). In turn, this information gives very helpful information 
about tangency which in turn can be applied to constrained extrema 
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problems. This raises at least a couple of questions. The immediate 
question raised here would be whether higher-degree polynomials also 
have discriminants. We’ll see that this is the case and will consider the 
case of cubic polynomials in this section. In the following section we'll 
introduce the discriminant for arbitrary polynomials. The notion of the 
determinant of a matrix will play a central role here. 


For the quadratic polynomial f(x) = ax?+bax+c having zeros 71, X29, 
we define the “new” quantity 


2 
if 
A = a’det | 1 ie | = a* (xo = ai: 


At first blush, it doesn’t appear that A has anything to do with the 
discriminant D. However, once we have designated the zeros of f as 
being x; and x2, then the Factor Theorem dictates that 


f(x) = a(x — 21)(a — 22). 


Since also f(x) = ax? + br + c we conclude by expanding the above 
that 


b=—a(a1+2%), and c=ax 1%. 


Now watch this: 


oar "(a9 — ny" 
= a? (a7 + x — 22172) 
= ay *I(a1 + v2)? — 44129] 
= [-a(x1 + 22)]? — 4a(axr122) 
= 6° —4ac= D. 


In other words, A and D are the same: 


Deke 


Therefore, D and A will satisfy the same trichotomy rule. But let’s try 
to develop the trichotomy rule directly in terms of A instead of D. 
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Case (i): A > 0. That is to say, (v2 — x1)? > 0 and so certainly the 
zeros x, and 2» are distinct. Furthermore, if they were not real, 
then they would have to be complex conjugates of one another and 
this would force (think about it!) (v2 —21)? < 0 (as v2— 21 would 
be purely imaginary). Therefore 


A>0O= > Q has two distinct real zeros. 


Case (ii): A = 0. This is clear in that one immediately has that 
X1 = Xp. That is to say 


A=0= > Q has a double zero. 


Case (iii): A < 0. Since (x22 — x2,)? < 0 we certainly cannot have both 
x, and x real. Therefore, they’re both complex (non-real) as they 
are complex conjugate. Therefore 


A <0= > Q has two complex (non-real) zeros. 


That is to say, D and A satisfy the same trichotomy law! 


Whereas the definition of D does not suggest a generalization to 
higher-degree polynomials, the definition of A can be easily generalized. 
We consider first a natural generalization to the cubic polynomial 


P(x) = az? + bz? +cx+d, a,b,c,d€R, af£0. 


By the Fundamental Theorem of Algebra, we know that (counting 
multiplicities), P(x) has three zeros; we shall denote them by 21, 2, 
and x23. They may be real or complex, but we do know that one of 
these zeros must be real. 


We set 
LE BG 


A Sa det) T ey.'a5 


2 
1 £3 23 


170 CHAPTER 3 INEQUALITIES 


With a bit of effort, this determinant can be expanded. It’s easier to 
first compute the determinant of the matrix 


Bese a 


1 z 2 

de Ga ae 
and then square the result. One has, after a bit of computation, the 
highly structured answer 


2 
L. Qin a 


det | 1 xq 22 | = (x3 — x2)(x3 — 21)(22 — 24), 
1 @3- 23 
(this is generalized in the next section) which implies that 


A=" (a3 = a5) (3 — 21) — a4). 


This is all well and good, but two questions immediately arise: 


e How does one compute A without knowing the zeros of P? Also, 
and perhaps more importantly, 


e what is A trying to tell us? 


Let’s start with the second bullet point and work out the trichotomy 
law dictated by A. 

If P(x) has three distinct real zeros, then it’s obvious that A > 0. 
If not all of the zeros are real, then P(x) has one real zero (say x,) and 
a complex-conjugate pair of non-real zeros (%2 and x3). In this case 
(x2 — 21), (%3 — 1) would be a complex conjugate pair, forcing 0 < 
(v2—21)(%3—2,) € Rand so certainly that 0 < (%2—21)?(43—21)? € R. 
Furthermore, (73 — x2) is purely imaginary and so (x3 — x2)? < 0, all 
forcing A < 0. Therefore, we see immediately that 


A >0=> P(z) has three distinct real zeros 
and that 
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A < 0 = > P(a) has one real zero and two non-real complex zeros. 


This is all rounded out by the obvious statement that 


A = 0 => P(a) has a multiple zero and all zeros are real. 


Of course, none of the above is worth much unless we have a method 
of computing A. The trick is to proceed as in the quadratic case and 
compute A in terms of the coefficients of P(x). We start with the 
observation that 


P(x) = a(x — £1)(x — £2)(4% — 23), 
all of which implies that (by expanding) 


b= —a(a1+%9+23), C= a(ay%o + 2123+ L9r3), d= —AX1L9K3. 
We set 
0, =v + XY + V3, 02 = VX + V1 L3 + U2V3, 03 = X1X9V3, 


and call them the elementary symmetric polynomials (in x1, %2, 73). 
On the other hand, by expanding out A, one has that (after quite a bit 
of very hard work!) 


Be a*(ax3 = x2) (23 a 1)? (x2 = v,)° 
= a'(—4o}o3 = Gos + 18010203 — 4o3 — 2703) 


Ab’d + b?c? + 18abcd — 4ac? — 27a?d? 


giving a surprisingly complicated homogeneous polynomial in the co- 
efficient a, b, c, and d. (See Exercise 6 below for a more more direct 
method for computing A.) 


We’ll close this section with a representative example. Keep in mind 
that just as in the case of the quadratic, when the discriminant of a 
cubic is 0, then the graph of this cubic is tangent to the x-axis at the 
multiple zero. 
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EXAMPLE. Compute the minimum value of the function 


if 
flz)= 4 +2, 0. 


SOLUTION. The minimum value will occur where the line y = c is 
tangent to the graph of y = f(x). We may write the equation 
f(x) =c in the form of a cubic polynomial in z: 


xo —cr?+1=0. 


As the tangent indicates a multiple zero, we must have A = 0. As 
a=1, b=-c, c=0, d=1, we get the equation 4c® — 27 = 0, 


3 
which implies that the minimum value is given by c = Vi (which 


can be verified by standard calculus techniques). 


Now try these: 


EXERCISES. 


1. Compute the minimum of the function 


1 
A(x) = —+2°, «>0. 
£ 


2. Compute the minimum of 2y — x given that 
a — g?y+1=0. 


3. Compute the maximum value of y + x? given that 
w—ary+2=0. 
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level curves 


4. Compute the maximum value of xy, given that 
ge? +y =A. 


5. The polynomial $(x1, 72,73) = 7-+73+23 is symmetric in 271, £2, 73 
and can be expanded in the elementary symmetric polynomials 


0, =%14+%24+%3, 02 = 11X24 41X34 12%3, 03 = V1 X9K3. 
Watch this: 


titra +23 = (1 +22+23)° 


— 8at(ro +23) — 343(a1 + 23) — 343(21 + x2) — 6212073 


= (41+ 224 £3)° 3(a1 + Go + ©3)(X122 + 1123 + L2X3) 
+ 321 X9X3 


= a — 20109 + 303. 


Now try to write the symmetric polynomial xi + x75 + 73 as a 
polynomial in 01, 09, 03. 


6. Let f(x) = ax*® + bx*? + cx + d, and so the derivative is f'(rz) = 
3ax? + 2br+c. Denote by R(f) the determinant 
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a bc dO 

0 «ab ed 

Rif) = det} 3a 2b c 0 O 
0 3a 2b c 0 

| 0 O 3a 2b ¢ 


Show that R(f) = —aD(f), where D = D(f) is the discriminant 
of f(x). (This is the generalization of the result of Exercise 6 to 
cubic polynomials.) 


3.7 The Discriminant (Optional Discussion) 


In this section ll give a couple of equivalent definitions of the discrim- 
inant of a polynomial of arbitrary degree. Not all proofs will be given, 
but an indication of what’s involved will be outlined. To this end, let 
there be given the polynomial 


f(x) = Ant” + dna” 1+---+ a2 + ao, 


where a,, ~ 0 and where all coefficients are real. Denoting by 71, 41, ..., Ln 
the zeros of f(x) (which may include several complex-conjugate pairs 
of imaginary zeros), we know that (by the Factor Theorem) 


f(x) = an(a — 21)(@ — %2)-++(@ — Zn). 


In analogy with the above work, we define the discriminant of f(z) 
by setting 


- 5D 
1. Gy Be es 
-1 
dy es eee 
_ 2 2 
AS AG) Sa det | 
[1 oan G++ ty | 


The above involves the determinant of the so-called Vandermonde 
matrix, 


SECTION 3.7 DISCRIMINANT 175 


1% 2 oS) 
=| 
1 a 22 x 
2 2 
V = det 
Ll 2% i es ‘aaa 


which makes frequent appearances throughout mathematics. Its deter- 
minant is given in the next theorem. 


THEOREM 3. detV = [|] (ax; — 7;). 


i<j 


PROOF. We argue by induction on n. Setting A = det V, we start by 
subtracting row 1 from rows 2,3,...,n, which quickly produces 
to =a; oa? o- oh leagh! 
io aaa eee gs 
A = det ae es 
In — 2, V2—ae ee ght ce 


Next, in each row we factor out the common factor of x; — 21, 7 = 
2,4,...,n, which leads to 


A = (%2— %1)(%3 — £1) +++ (pn — £1) X 
. : =2 
1 gota, cf¥+aor, +22? -- oh? + a8 82, 4+---+27 
=, = =2 
Ll agta, v2+aor,t+a? +.» oh? 4+ ah Pa +--+ +2} 
det | . , 
2 “ =2 
lL apt 2, O24 8n%, +27 ++) ah Peper 3r,4+---4+2} 


Next, if we subtract x; times column n — 2 from column n — 1, then 
subtract x; times column n — 3 from column n — 2, and so on, we'll 
eventually reach 


| Lay we we eee 
L3 a2 ae Tae 

A = (#2 —21)(%3 — 41)-+- (fn — 41) x det iat 
Laie eek 


= (#2 — %1)(%3 — 21)++* (Gp — 21) x TT (ej — xi) = [] (a; — 24), 


j>i>2 j>i 
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From the above, we see that 


1<i<j<n 


The difficulty with the above expression is that its computation ap- 
pears to require the zeros of f(a). However, this is a symmetric poly- 
nomial in the “variables” 21, 29, ..., 2» and hence’ can be written as 
a polynomial in the elementary symmetric polynomials 


¢ 


0, = U4+%o+°::+0n 


OPS fi ese: ae SD aa 
<j 


03 > S> LjL iL, 
i<j<k 


On = L1XQ°** In 


This was carried out for the quadratic polynomial f(x) = axz?+br+c on 
page 168; the result for the cubic polynomial f(x) = av? + ba? +cx4+d 
was given on page 171. Carrying this out for higher degree polynomials 
is quite difficult in practice (see, e.g., Exercise 5, above). The next 
two subsections will lead to a more direct (but still computationally 
very complex) method for computing the discriminant of an arbitrary 
polynomial. 


3.7.1 The resultant of f(x) and g(x) 


Let f(x) = ayz” + Gn_yx”™ 1 +--+ + ayx + ao, and g(x) = bmz™ + 
bye” | +--+ + b2 + bp be polynomials having real coefficients of 
degrees n and m, respectively. Define the (n+m) x (n+m) Sylvester 
matrix relative to f(x) and g(x), S(f,g) by setting 


‘This follows from the so-called Fundamental Theorem on Symmetric Polynomials. 
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| Gee Gat Opler 0-0 

O> OR. sty 0 O O 

0 0 0 a, ao 0 

= 0 0 0 sss) 42 QA, AQ 
Pe i pe as kd A 
Oy Die. Dent 4 0 0 O 

0 0 O +--+» by by O 

0 0 O +++ by by bo 


The resultant R(f,g) of f(x) and g(x) is the determinant of the 
corresponding Sylvester matrix: 


R(f,g) = det S(f,g). 


For example, if f(x) = agx?+a,r+<a9 and g(x) = b3x°+b ox? +b;r+bq, 
then 


a2 Qa, ao 0 0 | 
0 ag a, ag 0 
S(f,g)=det| 0 0 ay a, ao 
bs by by bo O 
0 bs bo by bo | 


Note that the resultant of two polynomials clearly remains unchanged 
upon field extension. 

We aim to list a few simple—albeit technical—results about the 
resultant. The first is reasonably straightforward and is proved by just 
keeping track of the sign changes introduced by swapping rows in a 
determinant. 


LEMMA 2. 
R(f,g) = (-1)"" Rg, f) 
where deg f = n and deg g = m. 
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Next, write 


f(z) =a,(a"+a'_j,a™ 14+---+alaetah), a =a;/an, 1 =0,1,...,n—1; 
n-1 1 0 a 
similarly, write 
g(x) = by (x +b a 1+..-+b)0+bh), b. = b;/b,, 7 =0,1,...,n—1; 
m-1 1 0 J Jj 


It follows easily that R(f, g) = ab? Rif /an, g/bm), which reduces com- 
putations to resultants of monic polynomials. 


THEOREM 4.. Let f(x), g(x), and h(x) be polynomials with real coef- 
ficients. Then 


R(fg,h) = RF, ARG, h). 


PROOF. From the above, it suffices to assume that all polynomials are 
monic (have leading coefficient 1). Furthermore, by the Fundamen- 
tal Theorem of Algebra f(x) splits into linear factors, and so it is 
sufficient to prove that the above result is true when f(x) = «+a is 
linear. Here, we let 


1 
f.. 


g(x) = 2" + ay_12"” -+a,x2+ a9, and 


A(x) = 2 + Om 12? + +++ + bye + do. 


First of all, one obviously has that 


R(x + a,h) = det S(x+a,h(z)) Z 


Onl i; 
where Z is the (m+ 1) x n matrix 
0 0 0 
Z = : 
0 0 0 
—~G4n-1 7°" ~~, —AQ 


Next, it is equally clear that 
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1 An-1 ay ao 0 0 
0 
R(g, h) = det : 
’ = de : 
? : S(g,h) 
ke 0 =| 
Finally, one checks that 
[ 1 An—-1 at ag 0 0 | 
0 
S(e+a,h(x)) Z] | 0 
On 49 I, S(g, h) 
0 


and we're done. 
Since R(x — a,x — b) = a—b, we immediately obtain the following: 


COROLLARY 1. Let f(x), g(x) be polynomials with real coefficients, 
and have leading coefficients a, and b,,, respectively. Assume f(x) has 
ZeYOS Qi, ..-, An, and that g(x) has zeros (1, ..., Bm. Then 


R(f,g) = ab", TT TL (ai — 8)). 


COROLLARY 2. R(f,g) = 0 if and only if f(x) and g(a) have a common 
ZeTO. 


The following corollary will be quite important in the next section. 


COROLLARY 3. Let f(x), g(x) be polynomials with real coefficients, 
and have leading coefficients a, and b,,, respectively. Assume f(x) has 
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ZEYOS Q1,.--, An. Then 


n 


R(f,g) =ay IT g(ai). 


i=1 


PrRooF. Let g(x) have zeros 3),..., 8m. Since 


g(x) = bm(@ — Bi) +++ (@ — Bm), 
we see that g(aj) = bm(a; — B1)---(a; — Bm). From this, the result 
follows instantly. 


3.7.2 The discriminant as a resultant 


As already given above, the discriminant of the polynomial f(x) = 


anx"+ lower-degree terms and having zeros x1, 9, ..., Zn iS given b 
9 d d 
. sy) 
2 n—-1 
Loe By ee ae 
=I 
i er ee ee 
_ aQn-2 2 #9 2 _ 2n—2 2 
A(f) = a" det | | =e T1(2; - 2) 
i<j 
2 n—-1 
Lol. Aas eae ae 


We relate the above to the resultant as follows. Let f(x) = a,x” 
+ lower-degree terms, where a, # 0, and let aj,...,@, be the zeros 
of f(a). Then, using Corollary 3 above, and letting f’ = f’(a) be the 
derivative of f, we have that 


RFF) =a I] f'(a). 
Next, we have f(x) = a,(a — a)---(x — a); applying the product rule 
for differentiation leads quickly to 


f'(x) = an Soa — a4) +++ (@ — a4) +++ (@ — Qn), 


i=1 
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where the convention is that the factor under the ~ is omitted. From 
the above, we see immediately that 


f'(ai) = an TI (ax — 95), 


j#t 
and so 
REL =a,” II f'(ai) = ar II Ih (oi — aj) =a" II (as — ay). 
I= i=l gF7u JF 


Finally, one checks that 


Th (ai— aj) = (-1)"""P? TT. (aj — au)’, 


j#i 1<i<j<n 
which gives the following relationship between the result of f(x) and 
f’(x) and the discriminant: 


THEOREM 5. Given the polynomial f(x) with real coefficients, one has 
that 


RGF a aA). 


If we return to the case of the quadratic f(x) = ax? + br + c, then 


Rif, f’) = Riaz? + br +c, 2ax + b) 


a be 
= det| 2a b O 
0 2a b 


=: (ab? =46e"e) = ab? = dace). 
Since for n = 2 we have (—1)"-)/* = —1, we see that, indeed, 
ele 


Rif, f!) = (-1)"-YPaA(f) in this familiar case. 


Note, finally, that as a result of the representation of A(f) in terms 
of R(f, f’) we see that A(f) is a homogeneous polynomial of degree 
2n — 2 in the coefficients ag, a1, ..., Gn. 
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3.7.3 A special class of trinomials 


We shall start this discussion with a specific example. Let f(x) = 
agx® + ax? + ayx + ag and let g(x) = box? + bya + bo and form the 
Sylvester matrix 


a3 a2 a, ag 0 
0 a3 ag aA, ag 
S(f,g) =| bz b; bb O 0 
0 bo b; bo O 
10 0 be by bo 


Next assume that in the above, we actually have a3 = ay = 0, and that 
bp £0. Then the determinant of the above is given by 


b 0 0 0 0 
0 bb 0 0 0 
det S(f,g) =det | 0 0 a, ay O | =bsR(ayx + ao, box* + dix + by). 


0 0 O ay a 
10 0 bo by bo | 


In general, assume that 


1 


f(x) = ana” + Gn-12" +--+ +a,z 4+ a 
and that 
Ge) = ban" Bbiggh” Ae He hy, Dye ZO. 
If we have ax # O and that apy, = ape vee An 0, then one 


patterns an argument based on the above to arrive at the conclusion: 


LEMMA 3. With hypotheses as above, 


det S(f,g) = (-1)™" or * RUF, 9). 


Assume now that we are given polynomials f(z) = a,2"+ lower, 
and g(x) = b»x’"+ lower. The Sylvester matrix S(f,g) has in rows 1 
through m the coefficients of f(x) and in rows m+ 1 through m+n 
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it has the coefficients of g(x). Note that adding a multiple a of row 
m+n to the first m rows of S(f,g) will produce the Sylvester matrix 
S(f +.ag,g), whose determinant is unchanged. If m < n, then adding 
a times row m+n -—1 to each of the first m rows of S(f,g) will 
produce the Sylvester matrix S(f + azxg,g) with the same determinant: 
det S(f + axg,g)det S(f,g). More generally, we see that as long as 
k <n—™m, then for any constant a, det S(f + ax*g,g) = det S(f,g). 
This easily implies the following very useful fact: 


THEOREM 6. Given the polynomials f(x), g(a) with real coefficients 
of degrees n > m (respectively), then for any polynomial h(x) of degree 
<n—m, detS(f + gh,g) = det S(f,g). 


Now consider the monic trinomial of the form f(z) = x” + ax +b 
where a, b are real numbers. Applying Theorem 3, we see that 


= 


| 
e 


VERE) 

"V2 det S(a” + ax +b, na" 4+ a) 

"V/det S((a — a/n)x + b,nz” +a) (Theorem 3) 
) 
) 


S 


| 
eH 


S 


| 
eH 


ea tg en ee Lipa) Seta pin “Beene 


Kae a nr 'R((a — a/n)x +b, na" * + a) 
ral 1(a@— an)" M(n(—1)""(b/ (a — an)" + a) 
ry 3) 

aii n—-1 ce 1 Hi r= Lynpr-t a a"(n = =) 

it n—-1 Cu ik 1 ae = Dia a 1)? 1a"), 


| 
eH 
3 


| 
— 
Page reg eg rage sg 


Q 
S 
= 
© 
— 
© 


3 
EXAMPLE. Let f(x) = —+2°, x > 0 and find the minimum value 


of f(x). While such a problem is typically the province of differential 
calculus, we can use a discriminant argument to obtain the solution. 


184 CHAPTER 3 INEQUALITIES 


We let m be the minimum value of f(z) and note that the graph of 
3 

y =m must be tangent to the graph of y = — + 2°. This is equivalent 
iy 


to saying that the solution of 3+2* = mz is a multiple root, forcing the 
discriminant of the quartic polynomial g(x) = 2+ — mz + 3 to be zero. 
That is to say, we need to find that value of m making the discriminant 
equal to 0. From the above, we have that 


0=Aq) =43° -3m*=05 m=4. 


In other words, the minimum value of q(x) on the interval (0,00) is 4. 
(The reader should check that the same result is obtained via differential 
calculus. ) 


Chapter 4 


Abstract Algebra 


While an oversimplification, abstract algebra grew out of an attempt 
to solve and otherwise understand polynomial equations (or systems of 
polynomial equations). A relative high point can be found in the early 
nineteenth century with E. Galois’ proof that polynomial equations 
of degree at least 5 need not be solvable by the “usual” processes of 
addition, subtraction, multiplication, division, and extraction of roots 
as applied to the polynomial’s coefficients. What’s remarkable is not so 
much the result itself but rather the methods employed. This marked 
the beginning of a new enterprise, now called group theory which soon 
took on a life of itself, quite apart from playing a role in polynomial 
equations. 

The language and level of abstraction in group theory quickly be- 
gan to spread, leading to the somewhat larger discipline of abstract 
algebra. We'll attempt to give the serious student a meaningful intro- 
duction in this chapter. 


4.1 Basics of Set Theory 


In this section we shall consider some elementary concepts related to 
sets and their elements, assuming that at a certain level, the students 
have encountered the notions. In particular we wish to review (not 
necessarily in this order) 


e Element containment (€) 


e Containment relationships between sets (C, D, C, (same as 
C),>, (same as 2)) 


~= 
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e Operations on subsets of a given set: intersection (M), union, 
(U), difference (—), and symmetric difference (+) of two sub- 
sets of a given set 


e Set-theoretic constructions: power set (2°), and Cartesian prod- 
uct (S x T) 


e Mappings (i.e., functions) between sets 


e Relations and equivalence relations on sets 


Looks scary, doesn’t it? Don’t worry, it’s all very natural... 


Before we launch into these topics, let’s get really crazy for a mo- 
ment. What we’re going to talk about is naive set theory. As opposed 
to what, you might ask? Well, here’s the point. When talking about 
sets, we typically use the language, 


“the set of all ...” 


Don’t we often talk like this? Haven’t you heard me say, “consider the 
set of all integers,” or “the set of all real numbers”? Maybe I’ve even 
asked you to think about the “set of all differentiable functions defined 
on the whole real line.” Surely none of this can possibly cause any 
difficulties! But what if we decide to consider something really huge, 
like the “set of all sets”? Despite the fact that this set is really big, 
it shouldn’t be a problem, should it? The only immediately peculiar 
aspect of this set—let’s call it B (for “big”)—is that not only B C B 
(which is true for all sets), but also that B € B. Since the set {1} ¢ {1}, 
we see that for a given set A, it may or may not happen that A € A. 
This leads us to consider, as did Bertrand Russell, the set of all sets 
which don’t contain themselves as an element; in symbols we would 
write this as 


R = {S| S¢5S}. 


This set R seems strange, but is it really a problem? Well, let’s take a 
closer look, asking the question, is R € R? By looking at the definition, 


SECTION 4.1 BAsics OF SET THEORY 187 


we see that R € Rif any only if R ¢ R! This is impossible! This is a 
paradox, often called Russell’s paradox (or Russell’s Antinomy). 


Conclusion: | Naive set theory leads to paradoxes! So what do we do? 
There are basically two choices: we could be much more careful and do 
axiomatic set theory, a highly formalized approach to set theory (I 
don’t care for the theory, myself!) but one that is free of such paradoxes. 
A more sensible approach for us is simply to continue to engage in naive 
set theory, trying to avoid sets that seem unreasonably large and hope 
for the best! 


4.1.1 Elementary relationships 


When dealing with sets naively, we shall assume that the statement “x 
in an element of the set A” makes sense and shall symbolically denote 
this statment by writing x € A. Thus, if Z denotes the set of integers, 
we can write such statements as 3 € Z, —11 € Z, and so on. Likewise, 
7 is not an integer so we’ll express this by writing 7 ¢ Z. 


In the vast majority of our considerations we shall be considering 
sets in a given “context,” i.e., as subsets of a given set. Thus, when I 
speak of the set of integers, I am usually referring to a particular subset 
of the real numbers. The point here is that while we might not really 
know what a real number is (and therefore we don’t really “understand” 
the set of real numbers), we probably have a better understanding of 
the particular subset consisting of integers (whole numbers). Anyway, 
if we denote by FR the set of all real numbers and write Z for the subset 
of of integers, then we can say that 


Z = {x €R| x is a whole number}. 


Since Z is a subset of R we have the familiar notation Z C R; if 
we wish to emphasize that they’re different sets (or that Z is properly 
contained in R), we write Z C R (some authors! write Z C R). Like- 
wise, if we let C be the set of all complex numbers, and consider also 
the set Q of all rational numbers, then we obviously have 


like me, but the former seems more customary in the high-school context. 
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LOQCRCEC. 


As a more geometrical sort of example, let us consider the set R? of 
all points in Cartesian 3-dimensional space. There are certain naturally 
defined subsets of R®, the lines and the planes. Thus, if IT is a plane 
in R?, and if L is a line contained in II, then of course we may write 
either L CII C Ror L CIC R*. Note, of course, that R? has far 
more subsets that just the subsets of lines and planes! 


One more example might be instructive here. First of all, if A is a 
finite set, we shall denote by |A| the number of elements in A. We often 
call |A| the cardinality or order of the set A. Now consider the finite 
set S = {1, 2, 3,...,8} (and so |S| = 8) and ask how many subsets 
(including S and the empty set J) are contained in S. As you might 
remember, there are 2° such subsets, and this can be shown in at least 
two ways. The most direct way of seeing this is to form subsets of S' 
by the following process: 


1 2 3 4 i) 6 7 8 
yes yes yes yes yes yes yes yes 
or no or no or no or no or no or no or no or no 


where in the above table, a subset if formed by a sequence of yes’s or 
no’s according as to whether or not the corresponding element is in 
the subset. Therefore, the subset {3, 6, 7, 8} would correspond to the 
sequence 


(no, no, yes, no, no, yes, yes, yes). 


This makes it already clear that since for each element there are two 
choices (“yes” or “no”), then there must be 


DSCD SD SDE O SOS OS 


possibilities in all. 


Another way to count the subsets of the above set is to do this: 
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Number of subsets 


number of subsets of size 0 
number of subsets of size 1 
number of subsets of size 2 
number of subsets of size 3 
number of subsets of size 4 
number of subsets of size 5 
number of subsets of size 6 


number of subsets of size 7 


+++ +4 4+ 4+ 4 


number of subsets of size 8 


= §)+H+04+04+0+04+0+0+@ 
= yf) a+) =28 


k=0 


where we have applied the Binomial Theorem. 


In general, if A is any set, we denote by 24 the set of all subsets of 
A, often called the power set of A. (Many authors denote this set by 
P(A).) We’ll have more to say about the power set later. At any rate, 
we showed above that if S = {1, 2,3,...,8}, then |2°] = 2°. The 
obvious generalization is this: 


Theorem. Let A be a finite set with cardinality n. The 24 has cardi- 
nality 2”. Symbolically, 


pal = Q/Al, 


EXERCISES 


1. Let p be a prime number and define the following subset of the 
rational numbers Q: 


i 
Qi) = E € Q| the fraction — is in lowest terms, and p doesn’t evenly divide s> . 
(p) é e y 
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Determine which of the following real numbers are in Qi): 


2 10 3 
™ 3) 9? cos(m/4), 12, —, 127, . 


. True or false: Z C Qi,) for any prime number p. 


. Consider the set S = {1, 2, 3, ..., 10}. Define the sets 


A = {subsets TC S| |T| = 2} 
B {subsets T C S| |T| = 2, and if x,y € T then |xz—y| > 2} 


Compute |A| and |B]. 


. Given the real number z, denote by [z] the largest integer n not 


exceeding x. Therefore, we have, for example, that [4.3] = 4, [a] = 
10 

3, [e] = 2, [—az] = —4, and | = 3. Define the set A of integers 

by setting 


ae {| saa) Fea al _ 105) 10} 


and compute | A]. 


4.1.2 Elementary operations on subsets of a given set 


Let A and B be subsets of some bigger set U (sometimes called the 
universal set; note that U shall just determine a context for the en- 
suing constructions). We have the familiar union and intersection, 
respectively, of these subsets: 


AUB = {ueEU|ucecAorue B}, and 


ANB = {ue€U|ueAand ue B}. 
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I’m sure that you’re reasonably comfortable with these notions. Two 
other important construction are the difference and complement, 
respectively: 


A-B = {weEU|ueA but u ¢ B}, and 


A’ = {uceU|ugA} = U-A. 


Relationships and operations regarding subsets are often symbolically 
represented through the familiar Venn diagram. For example, in the 
Venn diagram below, the student should have no difficulty in coloring 
in any one of the subsets AUB, ANB, A—B, B—A, A’ (or any others 
that might come to mind!) 


Venn diagrams can be useful in identifying properties of the above 
operations. One very typical example of such relationships and their 
Venn diagram proofs are the De Morgan Laws: for subsets A and B 
of a universal set U, one has 


(AU BY =A’ B’ and (ANB) = AUB". 


You can convince yourself of these facts by coloring in the Venn 
diagrams: 
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Actually, though, the De Morgan Laws are hardly surprising. If A 
represents “it will rain on Monday,” and B represents “it will rain on 
Tuesday,” then “it will not rain on Monday or Tuesday” is represented 
by (AU B)’, which is obviously the same as “it won’t rain on Monday 
and it won’t rain on Tuesday,” represented mathematically by A’N B’. 


A more formal proof might run along the following lines. In proving 
that for two sets S = T, it is often convenient to prove that S C T and 
ie 7 GS. S. 


Theorem. For subsets A and B of a given set U,(AU B)! = A'N B’. 


Proof. Let « € (AU B)’. Then z is not in AU B, which means that 
x is not in A and that x is not in B, ie., c € A’MB’. This proves 
that (A U B)’ C A’ B’. Conversely, if 2 € A’ B’, then z is not in 
A and that x is not in B, and so x is not in AU B. But this says that 
x € (AUBY’, proving that (AU B)’ C A’N B’. It follows, therefore, 
that (AU By’ = A’N B’. 


There are two other important results related to unions and inter- 
sections, both of which are somewhat less obvious than the De Morgan 
laws. Let’s summarize these results as a theorem: 


Theorem. Let A, B, and C’ be subsets of some universal set U. Then 
we have two “distributive laws:” 


AN(BUC) = (ANB)U(ANC), and AU(BNC) = (AUB)N(AUC). 
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Proof. As you might expect the above can be easily demonstrated 
through Venn diagrams (see Exercise 1 below). Here, I’ll give a formal 
proof of the first result (viz., that “intersection distributes over union” ). 
Let c € AN(BUC) and soz € A and « € BUC. From this 
we see that either x € A and x € B or that x € A and az E€ C, 
which means, of course, that x € (AN B) U(ANC), proving that 
AN(BUC) C (ANB)U(ANC). Conversely, if € (ANB)U(ANC), then 
x € ANBorx € ANC. In either case x € A, but also x € BUC, which 
means that x € AN( BUC), proving that AN( BUC) C (ANB)U(ANC). 
It follows that AN(BUC) = (ANB)U(ANC). The motivated student 
will have no difficulty in likewise providing a formal proof of the second 
distributive law. 


EXERCISES 


1. Give Venn diagram proofs of the distributive laws: 


AN(BUC) = (ANB)U(ANC), and AU(BNC) = (AUB)N(AUC). 


2. Show that if A, BCU, then A- B= ANB’. 


3. Use a Venn diagram argument to show that if A, B, C C U, then 


A-(BUC) = (A—B)N(A—C) and A-(BNC) = (A—B)U(A-C). 


4. Show that if A, B C U, and if A and B are finite subsets, then 
|AU B| = |A|+ |B] —|AN BI. 
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. Show that if A, B, and C C U, and if A, B, and C are finite 


subsets, then 


|AUBUC| = |A|+|B]+|C|—|AN B|—|ANC|—|BNC|+|ANBnc|. 


. Try to generalize Exercise 5 above.” 


. (Compare with Exercise 3 of Subsection 4.1.1) Consider the set 


S = {1, 2, 3, ..., 10}, and define the sets 


T = {ordered pairs (X,Y) of subsets X,Y CS, with |X], |Y| = 
2and X NY = @} 


T’ = {subsets {X,Y} C 2°| |X|, |Y] =2 and XNY = 6} 


Compute |7’| and |7"|. 


. In this problem the universal set is the real line R. Find A U 


B, ANB, A— B, B—A, and (AU B)’, where A = ] — 10,5] and 
B= |-4,q]. 


. In this problem the universal set is the Cartesian plane R? = 


{(x,y)| 2, y © R}. Define the subsets 


A = {(z,y)|a°+y' <1} and B = {(2,y)|y2 2°}. 


Sketch the following sets as subsets of R?: AUB, ANB, A— 
B, B—A, and (AU BY. 


Let A, B CU and define the symmetric difference of A and B 
by setting 

A+B = (AUB)-(ANB). 
Using Venn diagram arguments, show the distributive laws 


ALB = (Ae BOB 2A) 


?This is the classical principle of Inclusion-Exclusion. 
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AN(B+C) (An B)+(ANC), where A, B, CCU 
A+(BNC) = (A+ B)N(A+C), where A, B,C CU. 


11. Let p be a fixed prime and let Qi,) be the set defined in Exercise 1 
of Subsection 4.1.1. Interpret and prove the statement that 


(} QQ») = @ 


all primes p 


12. Interpret and prove the statements 


CO 


@) A (0.4) = 10} 
(ii) ia (J0,4]) = 0 


4.1.3 Elementary constructions—new sets from old 


We have already encountered an elementary construction on a given 
set: that of the power set. That is, if S is a set, then 2° is the set 
of all subsets of the set S. Furthermore, we saw in the theorem on 
page 189 that if S' is a finite set containing n elements, then the power 
set 2° contains 2” elements (which motivates the notation in the first 
place!). Next, let A and B be sets. We form the Cartesian product 
A x B to be the set of all ordered pairs of elements (a,b) formed by 
elements of A and B, respectively. More formally, 


AxB = {(a,b)|a¢Aand be B}. 


From the above, we see that we can regard the Cartesian plane R? 
as the Cartesian product of the real line R with itself: R? = R x R. 
Similarly, Cartesian 3-space R®° is just R x RX R. 


Here are a couple of constructions to think about. Perhaps you 
can see how a right circular cylinder of height h and radius r can be 
regarded as S x |0,h], where S' is a circle of radius h. Next, can you 
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see how the product S x S of two circles could be identified with the 
torus (the surface of a doughnut)?? 


Finally, it should be obvious that if A and B are finite sets |A x B| = 
|A| - |B]. 


EXERCISES 


1. Let n be a positive integer and let S = {1, 2,...,n}. Define the 
subset TC S x S by T = {(a,b) € S x S| |a—b| = 1}. Compute 
|Z| as a function of n. 


2. Let n be a positive integer and let S be as above. Define the subset 
ZCSxSxS by Z = {(a,b,c) € Sx SxS'| a, b, c are all distinct}. 
Compute |Z| as a function of n. 


3. Let X and Y be sets, and let C, DC Y. Prove that X x (CUD) = 
(Xx SCVUCeSeD. 


4. Let X and Y be sets, let A, B C X and let C, D CY. Is it always 
true that 
(AUB) x(CUD) = (AxC)U(Bx D)? 


5. Let T and T” be the sets defined in Exercise 7 of Subsection 4.1.2. 
Which of the following statements are true: 


PEC. TCS ey, Teo... FCP 
PCS SS. TCS SS. Teo. Fa 


3Here’s a parametrization of the torus which you might find interesting. Let R and r be positive 
real numbers with r < R. The following parametric equations describe a torus of outer radius r+ R 
and inner radius R —r: 
x = (R+rcos ¢) cosé 
y = (r+rcos¢) sin@ 
z=rsing. 
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4.1.4 Mappings between sets 


Let A and B be sets. A mapping from A to B is simply a function 


from A to B; we often express this by writing f: A > Bor A ena ce 
Let’s give some examples (some very familiar): 


e f:R-R is given by f(r) =2?-2+1,2ER 
e f :R—- Cis given by f(x) =(x@—1)+iz?, TER 


e Let Zt C Z be the set of positive integers and define g : Z? + R 
by g(m) = cos(27/n), n € Zt 


eh:RxR- Ris given by f(z,y) =x-y, 2, yER. 
ey:RxXR- Ris given by y(z,y) = 27 +y? 
eg: Z- Zis given by q(n) = 3(n? +n), nEZ 


ey: Z* + {-1,0,1} is given by 


1 ifn is the product of an even number of distinct primes 
u(n) = 4-1 if n is the product of an odd number of distinct primes 


0 if nis not the product of distinct primes 


Thus, for example, (1) = 0. Also, (6) = 1, as 6 = 2-3, the 
product of two distinct primes. Likewise, 14(5) = (30) = —1, and 
p(18) = 0. 


eh:RxR- Cis given by h(z,y) =x+iy, 71, yER 


eo: {1, 2, 3, 4, 5, 6} > {1, 2, 3, 4, 5, 6} is represented by 


| eae ei 
o:}Ltliilid 
2 Br, 1G 
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If f : A— B is a mapping we call A the domain of f and call B 
the codomain of f. The range of f is the subset {f(a)|a¢€A}C B. 


Some definitions. Let A and B be sets and let f : A > B. We say 
that 


f is one-to-one (or is injective) if whenever x, y € A, x # y then 
f(x) 4 fly). (This is equivalent with saying that f(x) = f(y) > 
x = y, where xz, y € A.) 


f is onto (or is surjective) if for any z € B there is an element 7 € A 
such that f(x) = z. Put differently, f is onto if the range of r is 
all of B. 


f is bijective if f is both one-to-one and onto. 


The following definition is extremely useful. Let A and B be sets 
and let f : A + B be a mapping. Let b € B; the fibre of f over }, 
written f~1(b) is the set 


f() ={a€ A] fla) =b} CA. 


Please do not confuse fibres with anything having to do with 
the inverse function f~!, as this might not exist! Note that if 
b € B then the fibre over 6 might be the empty set. However, if we 
know that f : A > B is onto, then the fibre over each element of B is 
nonempty. If, in fact, for each b € B the fibre f~'(b) over b consists of 
a single element, then we are guaranteed that f is a bijection. 


Finally, a mapping f : A > A from a set into itself is called a permu- 
tation if it is a bijection. It should be clear that if |A| = n, then there 
are n! bijections on A. 


EXERCISES 


1. Let f :R — R be a quadratic function. Therefore, there are real 
constants a, b,c € R with a 4 0 such that f(x) = ax? + ba +c for 
all x € R. Prove that f cannot be either injective or surjective. 
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Suppose that f : R — FR is a cubic function such that f(a) 4 0 
for all x € R. Give an intuitive argument (I’m not asking for a 
formal proof) that f must be bijective. 


. Define f : R x R > RFR by setting f(x,y) = x — y. Show that f is 


onto but is not one-to-one. 


. Define f: Rx R > R by setting f(x,y) = x? + y. Show that f 


is onto but is not one-to-one. 


. Let f : C > C be a quadratic function. Therefore, there are 


complex constants a, b, c € C with a 0 such that f(r) = ax? + 
bx + c for all x € C. Prove that f is onto but not one-to-one. 
(Compare with Exercise 1, above.) 


. For the mapping given in Exercise 3, above, show that the fibre 


over each point in FR is a line in R x FR. 


. What are the fibres of the mapping in Exercise 4? 


. (A guided exercise) Let A be a set and let 24 be its power set. 


Let’s show that there cannot exist any surjective function 
f :A— 24. A good way to proceed is to argue by contra- 
diction, which means that we’ll assume that, in fact, a surjective 
function exists and then reach a contradiction! So let’s assume that 
f : A— 24 is surjective. Note first that for any element a € A, it 
may or may not happen that a € f(a) (this is important!). Now 
consider the following strange subset of A: 


Ay = {ac Alag f(a)} € 24. 


Is Ag = f(a) for some element ay € A? Think about it! This 
contradiction has the same flavor as Russell’s paradox! 
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4.1.5 Relations and equivalence relations 


Let S be aset. A relation R on S is simply a subset of S x S. Nothing 
more, nothing less. If (7, y) € RC S x S, then we typically write «Ry 
and say that a is related to y. A few examples might clarify this. 


(i) Let R be the relation “<” on the set 7 of real numbers. Therefore, 
R= {(tz,y)€RxR| az < y}. 


(ii) Fix a positive integer m and recall that if a € Z then m|a means 
that a is a multiple of m. Now let R be the relation on the set Z 
of integers defined by 


aRb = ml\(a— b). 


Note that we already met this relation in Section 2.1.3. 


This relation is, as we have seen, customarily denoted “mod m” 
and read “congruence modulo m.” Thus if m = 7, then we can say 
that 1 = 15 (mod 7)” where we read this as “1 is congruent to 15 
modulo 7.” 


Note, in particular, that if m = 7 then the integers which are 
congruent modulo 7 to —1 are precisely those of the form —1 + 
ge eee oo pes 


(iii) Let S = {1, 2, 3, 4, 5, 6}. We may express a relation R on S by 
specifying a matrix P containing Os and 1s and where the rows 
and columns are labeled by the elements of S in the order 1, 2, 
3, 4, 5, and 6 and where a “1” in row 7 and column j designates 
that 7Rj. More specifically, let’s consider the relation defined as 
by the matrix: 
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LO. Qe 30 
Ook Oech HD 
1 Oe tO. 9 
On dL. OL. 20 
10101 0 
00000 1 


In this example we see that sRs for all s € S. You should have no 
trouble in writing down all the correct relational expressions Ry. 


(iv) Here’s one of my favorite examples. Let 75 be the set of all 2- 
element subsets of {1, 2, 3, 4, 5} and define the relation R on T; 
by stipulating that A; RA: <= A, M Ag = 9. Can you compute |R| 
in this example (see Exercise 3, below)? Can you reformulate this 
in terms of an appropriate graph, having Ts as its set of vertices? 


(v) Define the relation R on the real line R by stipulating that rRy © 
x —y €Z. What are the elements related to 1? 


Let R be a relation on a set S. We say that R is an equivalence 
relation if the following three properties hold: 


R is reflexive: sRs for any s € S; 
Ris symmetric: s;Rso & s9Rs1, $1, 59 € S; 


R is transitive: s,; Rs and s9Rs3 => s,Rs3, 81, 82, 53 € S. 


Of the five examples given above, the relations (ii), (iii), and (v) 
are equivalence relations. The relation given in (i) is neither reflexive 
(since « < a is false for all real numbers) nor is it symmetric (1 < 2 
but 2 ¢ 1). This relation is, however transitive (easy to check!). The 
analysis of the remaining cases are left to the exercises. 


Example (iii) is a bit different from the others, which warrant a few 
extra words. Two (almost) obvious facts are the following: Since the 
matrix has all 1s down the diagonal, this already proves the reflexivity 
of the relation. Next, the matrix is symmetric which proves that the 
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relation is symmetric. How about the transitivity? This involves a 
more work, but a bit of thought reveals the following. If P denotes the 
above matrix, and if P? has nonzero entries in exactly the same places 
as P, then the relation is also transitive. 


Let S be a set and let R be an equivalence relation on S. For any 
element s € S we denote by [s] the set 


[s}] = {s'€ S| sRs'} CS, 


and call this set the equivalence class in S containing s € S. Note 
that if s;Rs2 then [51] = [s2] because s; and sz are equivalent to exactly 
the same elements of S. 


Proposition. Let R be an equivalence relation on the set S and let |s] 
and [s'| be two equivalence classes in S'. Then either |s] = [s'], in which 
case sRs’ or |s| 1 [s'] = 0, where then s Rs’. 


Proof. Assume that [s] 9 [s’] # 0, say that there is some element 
t € [s] N [s’]. Therefore sRt and s‘Rt which implies by symmetry that 
sRt and tRs’. Using transitivity, we see that sRs’ which means that 
s, s’ are equivalent to exactly the same elements of S. From this it 
follows that [s] = [s’]. The only other possibility is that [s]M[s’] = in 
which case one obviously has s fs’. 


As a result of the above proposition we see that an equivalence re- 
lation R on a set S' partitions the set into disjoint equivalence classes. 
In light of this, let’s take a look at a few examples. 


(i) Consider the equivalence relation “= (mod 7)” on the set Z of 
integers. We have the following decomposition of Z into exactly 7 
equivalence classes: 


Ol et ST, ad ot 
0 Rese rere Fees ae Pe an 
pS Bit 5 20 1G, 
3] Sj 4s 11, —4; 3, 10; 14,023 
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es ees (ee ee ee 
[By SF ey 0, Oe By 1D) ist 
(Gi eres see ie ote ae eee 


(ii) Let R be the relation on R given by rRy = x-y € Q. This 
is easily shown to be an equivalence relation, as follows. First 
xReasx—-x=O0€Q. Next, if rRy, then x — y € Q and so 
y—-x=-(x4-y) €Q ie, yRz. Finally, assume that xRy and 
that yRz. Then x—y, y—z € Qand so x—z = (x#-y)+(y—z) € Q 
and so xRz. Note that the equivalence class containing the real 
number x is {v+r|r € Q}. 


(iii) Define the function f : R? > R by setting f(x,y) = «—y. Define 
an equivalence relation on R? by stipulating that (x1, y,) R(x, y2) > 
f(t1,y1) = f(x2,y2). Note that this is the same as saying that 
L1 — Y1 = L2 — yo. Thus, the equivalence classes are nothing more 
than the fibres of the mapping f. We can visualize these equiva- 
lence classes by noting that the above condition can be expressed 
as ge, 1, which says that the equivalence classes are pre- 

LQ — X41 
cisely the various lines of slope 1 in the Cartesian plane R?. 


One final definition is appropriate here. Let S be a set and let R be 
an equivalence relation on S. The quotient set of S by R is the set of 
equivalence classes in S. In symbols, this is 


S/R = {lal| ae S}. 


We shall conclude this subsection with a particularly important quo- 
tient set. Let n € Z* and let R be the relation “= (mod n).” One 
usually writes Z, for the corresponding quotient set. That is, 


Zn = {{m]| meZ} = {(0}, (1), 2, .-..[n— 1} 


4Tt’s important that the IB examination authors do not use the brackets in writing the elements 
of Z,; they simply write Z, = {0, 1, 2,...,—1}. While logically incorrect, this really shouldn’t 
cause too much confusion. 
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EXERCISES 
1. Let S = {1, 2, 3, 4}. How many relations are there on S? 


ve 


Let m € Z* and show that “= (mod m)” is an equivalence rela- 
tion on Z. How many distinct equivalence classes mod m are there 
in Z? 


. Let Ts; be the set of all 2-element subsets of {1, 2, 3, 4, 5} and say 


define the relation R on Ts by stipulating that A; RA, @ AiNA2 = 
(). Compute |R|. Which of the three properties of an equivalence 
relation does R satisfy? 


. Let f : S + T be a mapping and define a relation on S' by stip- 


ulating that sRs’ = f(s) = f(s’). (Note that this is says that 
sRs' & s and s’ are in the same fibre of f.) Show that R is an 
equivalence relation. 


. Define the following relation on the Cartesian 3-space R?: PRQ © 


P and Q are the same distance from the origin. Prove that FR is an 
equivalence relation on R? and determine the equivalence classes 
in R?®. 


. Suppose that we try to define a function f : Z, — Z by setting 


f({n]) =n — 2. What’s wrong with this definition? 


. Suppose that we try to define a function g : Z, > {+1, +7} by 


setting g([n]) = 7". Does this function suffer the same difficulty as 
that in Exercise 6? 


. Suppose that we try to define function tT : Z — Zy, by setting 


T(n) = [n — 2]. Does this function suffer the same difficulty as 
that in Exercise 6? What’s going on here? 


. Let R be the relation on the real line given by rRy & x—yeEZ, 


and denote by the R/R the corresponding quotient set.” Suppose 
that we try to define p: R/R — C by setting p((x]) = cos 27a + 
isin 27x. Does this definition make sense? 


>Most authors denote this quotient set by R/Z. 
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10. 


ll. 


We saw on page 141 that the complete graph Ks cannot be planar, 
i.e., cannot be drawn in the plane. Let’s see if we can draw it 
elsewhere. Start by letting 


C = {(z,y) € R*|2?+y" <1}; 


therefore, C' is just the “disk” in the plane with radius 1. We 
define an equivalence relation R on C' by specifying that a point 
(x,y) on the boundary of C (so x? + y? = 1) is equivalent with 
its “antipodal” point (—x,—y). Points on the interior of C’ are 
equivalent only with themselves. We call the quotient set C/R the 
real projective plane, often written RP?. (Recall that C/R is 
just the set of equivalence classes.) 


Explain how the drawing to the 
right can be interpreted as a draw- 
ing of Ks in the real projective 
plane. Also, compute the Euler 
characteristic v—e+ f for this draw- 
ing. 


Here’s another construction of RP?, the real projective plane; see 
Exercise 10, above. Namely, take the unit sphere S? C R®, de- 
fined by S? = {(2,y,z) € {R°| 2? + y?+2* = 1}. We define a 
“seometry” on 5” by defining points to be the usual points on 
the sphere and defining lines to be “great circles,” i.e., circles on 
the sphere which form the shortest path between any two distinct 
points on such a circle. Notice, therefore, that the equator on the 
earth (approximately a sphere) is a great circle; so are the latitude 
lines. With this definition of lines and points we see that Euclid’s 
parallel postulate® is violated as distinct parallel lines simply don’t 
exist: any pair of distinct lines must meet in exactly two points. 


6 


viz., that through any line 4, and any point P not on ¢; there is a unique line @2 through the 


point P and not intersecting /;. 
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Form an equivalence relation R on S$? by declaring any point on the 
sphere to be equivalent with its “antipode.” (Thus on the earth, 
the north and south poles would be equivalent.) The quotient set 
S?/R is often called the real projective plane and denoted RP”. 


(a) Give at least a heuristic argument that the constructions of 
RP? given in this and Exercise 10 are equivalent. 

(b) Show that on RP? that any pair of distinct points determine 
a unique line and that any pair of distinct lines intersect in a 
unique point.’ 


4.2 Basics of Group Theory 


4.2.1 Motivation—graph automorphisms 


We shall start this discussion with one of my favorite questions, namely 
which of the following graphs is more “symmetrical?” 


While this question might not quite make sense at the outset, it is 
my intention to have the reader rely mostly on intuition. Incidently, 


"This says that the real projective plane has a “point-line duality” not enjoyed by the usual 
Euclidean plane. 
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I have asked this question many times and to many people—some 
mathematicians—and often, if not usually, I get the wrong intuitive 
response! Without pursuing the details any further, suffice it to say for 
now that group theory is the “algebraitization” of symmetry. Put less 
obtusely, groups give us a way of “quantifying” symmetry: the larger 
the group (which is something we can often compute!) the greater the 
symmetry. This is hardly a novel view of group theory. Indeed the 
prominent mathematician of the late 19-th and early 20-th century Fe- 
lix Klein regarded all of geometry as nothing more than the study of 
properties invariant under groups. 

Apart from quantifying symmetry, groups can give us a more ex- 
plicit way to separate types of symmetry. As we’ll see shortly, the two 
geometrical figures below both have four-fold symmetry (that is, they 
have groups of order 4), but the nature of the symmetry is different 
(the groups are not isomorphic). 


Anyway, let’s return briefly to the question raised above, namely 
that of the relative symmetry of the two diagrams above. Given a 
graph G we now consider the set of all permutations o of the set of 
vertices of G such that 


vertices a and b form an edge of G = o(a) and o(b) form an edge of G. 


A permutation satisfying the above is called an automorphism of 
the graph G, and the set of all such automorphisms is often denoted 
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Aut(G). The most important facts related to graph automorphisms is 
the following: 


Proposition. The composition of two graph automorphisms is a graph 
automorphism. Also the inverse of a graph automorphism is a graph 
automorphism. 


Proof. This is quite simple. Let o and 7 be two graph automorphisms, 
and let v and w be vertices of the graph. Then 0 o r(v) and go T(w) 
form an edge & Tt(v) and 7(w) form an edge © v and w form an 
edge. Next, let o be a graph automorphism. Since o is a permutation, 
it is bijective and so the inverse function o~! exists. Thus we need to 
show that vertices v and w form an edge & oa 1!(v) and o!(w) forms 
an edge. If v and w form an edge, this says that o(a~'(v)), o(a71(w)) 
form an edge. But since o is an automorphism, we see that o~!(v) and 
o'(w) form and edge. In other words, 


v and w form an edge > @'(v) and a '(w) form an edge. 


Conversely, assume that o~'(v) and o~!(w) form an edge. Then since 
go is an automorphism, we may apply o to conclude that the vertices 
o(a'(v)), o(o *(w)) form an edge. But this says that v and w form 
an edge, i.e., 


a '(v) and a '(w) form an edge > v and w form an edge, 


which proves that o~! is a graph automorphism. 
The above fact will turn out to be hugely important! 


EXERCISES 


1. Consider the graph shown below. 
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(i) Give an automorphism o which takes vertex 1 to vertex 2 by 
completing the following: 


Compute the inverse of this automorphism. 


(ii) Give an automorphism 7 which maps vertex 1 to 3 and fixes 
vertex 5 (that is 7(5) = 5). 


123.456 
T:/Li tits 
3 5 


Compute the inverse of this automorphism. 


2. Consider the graph shown below. 
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(i) Give an automorphism o which takes vertex 1 to vertex 2 by 
completing the following: 


i 2 35 4.56 
Qe Se ae bt 
2 
Compute the inverse of this automorphism. 


(ii) Give an automorphism 7 which maps vertex 1 to 3 and fixes 
vertex 5 (that is 7(5) = 5). 


Compute the inverse of this automorphism. 


4.2.2 Abstract algebra—the concept of a binary operation 


Intuitively, a binary operation on a set S is a rule for “multiplying” 
elements of S together to form new elements. 
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Definition of Binary Operation on a Set. A binary operation 
on a non-empty set S is a mapping *:S x SS. 


No more, no less! We usually write s « s’ in place of the more formal 
*(s, 5’). 


We have a wealth of examples available; we’ll review just a few of 
them here. 


The familiar operations + and - are binary operations on our fa- 
vorite number systems: Z, Q, R, C. 


Note that if S is the set of irrational numbers then neither + nor 
- defines a binary operation on S. (Why not?) 


Note that subtraction — defines a binary operation on FR. 


Let Mat,,(R) denote the n xn matrices with real coefficients. Then 
both + (matrix addition) and - (matrix multiplication) define bi- 
nary operations on Mat,(R). 


Let S be any set and let F(S) = {functions : S > S}. Then 
function composition o defines a binary operation on F(S). (This 
is a particularly important example.) 


Let Vect3(R) denote the vectors in 3-space. Then the vector 
cross product x is a binary operation on Vect3(7). Note that the 
scalar product - does not define a binary operation on Vect3(7). 


Let A be a set and let 24 be its power set. The operations N, U, 
and + (symmetric difference) are all important binary operations 
on 24. 


Let S be a set and let Sym(S) be the set of all permutations on 
S. Then function composition o defines a binary operation on 
Sym(S). We really should prove this. Thus let 0, 7: 5S 4 S 
be permutations; thus they are one-to-one and onto. We need to 
show that go7: S — S is also one-to-one and onto. 


o oT is one-to-one: Assume that s, s’ € S and that 00 7(s) = 
ao oT(s’). Since a is one-to-one, we conclude that 7(s) = 7(s’). 
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Since 7 is one-to-one, we conclude that s = s’, which proves 
that 0 oT is also one-to-one. 


goT is onto: We need to prove that for any s € S there exists 
some s’ € S such that 0 o r(s’) = s. However, since a is onto, 
there must exist some element s” € S such that o(s”) = s. 
But since 7 is onto there exists some element s’ € S such that 
T(s') = s". Therefore, it follows that ¢ 0 r(s’) = o(r7(s’)) = 
o(s") = s, proving that 0 oT is onto. 


Before looking further for examples, I’d like to amplify the issue of 
“closure,” as it will given many additional examples of binary opera- 
tions. 


Definition of Closure. Let S be a set, let * be a binary operation 
on S, and let @ 4 T CS. We say that T is closed under the binary 
operation * if t * t’ © T whenever t, t' € T. In this case it is then 
follows that * also defines a binary operation on 7’. Where the above 
IB remark is misleading is that we don’t speak of a binary operation as 
being closed, we speak of a subset being closed under the given binary 
operation! 


More examples ... 


e Let R be the real numbers. Then Z and Q are both closed under 
both addition and multiplication. 


e Note that the negative real numbers are not closed under multi- 
plication. 


e Let Z[ V5] = {a + bV/5| a, b € Z}. Then Z[/5] is easily checked 
to be closed under both addition + and multiplicaton - of com- 
plex numbers. (Addition is easy. For multiplication, note that if 
a, b, c, d € Z, then 


(a+ b/5) «(e+ dV5) = (ac + 5bd) + (ad + be)V5. 
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Note that the above example depends heavily on the fact that Z 
is closed under both addition and multiplication.) 


Let GL,(R) C Mat,(7) denote the matrices of determinant ¥ 0, 
and let GL; (R) C Mat,,(R) denote the matrices of positive deter- 
minant. Then both of these sets are closed under multiplication; 
neither of these sets are closed under addition. 


The subset {0, +7, +7, +k} C Vects(7) is closed under vector 
cross product x. The subset {0, 7, 7, &} C Vect3(Z) is not. (Why 
not?) 


The subset {—1, 0, 1} C Z is closed under multiplication but not 
under addition. 


e Let X be a set and let Sym(X) denote the set of permutations. 
Fix an element x € X and let Sym,(X) C Sym(S) be the subset 
of all permutations which fix the element s. That is to say, 


Sym,(X) = {0 € Sym(X)| o(x) = z}. 
Then Sym,,(X) is closed under function composition o (Exercise 5). 
We have two more extremely important binary operations, namely 


addition and subtraction on Z,, the integers modulo n. These 
operations are defined by setting 


[a] + [b]} = [a+b], and [a]: [db] = [a-d], a, be Z8 


We shall sometimes drop the |-| notation; as long as the context is 
clear, this shouldn’t cause any confusion. 


8A somewhat subtle issue related to this “definition” is whether it makes sense. The problem is 
that the same equivalence class can have many names: for example if we are considering congruence 
modulo 5, we have [3] = [8] = [—2], and so on. Likewise [4] = [—1] = [14]. Note that [3]+ [4] = [7] = 
[2]. Since [3] = [—2] and since [4] = [14], adding [—2] and [14] should give the same result. But they 
do: [—2]+ [14] = [12] = [2]. Here how a proof that this definition of addition really makes sense (i.e., 
that it is well defined) would run. Let [a] = [a’] and [}] = [b']. Then a’ = a+ 5k for some integer 
k and b‘ = b+ 51 for some integer 1. Therefore [a’] + [b'] = [a + 5k] + [b+ 5] = [a+b0+5k+4+5]] = 
[a + b] = [a] + [6]. Similar comments show that multiplication likewise makes sense. Finally this 
generalizes immediately to Z,, for any positive integer n. 
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We display addition and multiplication on the integers modulo 5 
in the following obvious tables: 


+/0 12 3 4 x|0 1 2-3 4 
0/0 12 3 4 0;0 0 0 0 0 
Me Te ee os A) 1;0 12 3 4 
2 2 A OD 2/0 2 4 1 3 
3/3 4 01 2 3/0 3 1 4 2 
4/4 01 2 3 4/0 43 2 1 


EXERCISES 


iff 


Denote by 2Z C Z the even integers. Is 2Z closed under addition? 
Under multiplication? 


. Is the set of odd integers closed under either addition or multipli- 


cation? 


. On the set Z of integers define the binary operation *« by setting 


x*xy=x+2y © Z. Is the set of even integers closed under *? Is 
the set of odd integers closed under *? 


. Let U2(R) C Mate(R) be defined by setting 


Is Up(R) closed under matrix addition? Under matrix multiplica- 
tion? 


. Let X be a set and let Sym(X) be the set of permutations of X. 


Fix an element « € X and show that Sym,(X) is closed under 


4 oP) 


function composition “o. 


. Let A be a set and let A C 24 be the subset of the power set 


consisting of all finite subsets of even cardinality. Show that if 
|A| > 3, then A is not closed under either M or U but it is closed 
under +. (Why do we need to assume that |A| > 3?) 
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7. Are the non-zero elements {1, 2, 3, 4, 5} in Zg closed under mul- 
tiplication? 


8. Are the non-zero elements of Z,, where p is a prime number, closed 
under multiplication? 


9. For any positive integer n, set N, = {1,2,...,n}, and let P(N,,) 
be the power set of N,, (see page 189). Show that for any integer 
N with 0 < N < 2” there exists subsets A, B C P(N,,) such that 


(a) |A| = |B] =N, 
(b) A is closed under N, and 
(c) B is closed under U. 


(Hint: Use induction, together with the De Morgan laws.) 


4.2.3 Properties of binary operations 


Ordinary addition and multiplication enjoy very desirable properties, 
most notably, associativity and commutativity. Matrix multiplication 
is also associative (though proving this takes a little work), but not 
commutative. The vector cross product of vectors in 3-space is neither 
associative nor is it commutative. (The cross product is “anticommu- 
tative” in the sense that for vectors u and v, u x v = —v x u. The 
nonassociativity is called for in Exercise 2, below.) This motivates the 
following general definition: let S be a set with a binary operation *. 
We recall that 


* is associative if 1 * (s2 * 53) = (51 * S2) * 83, for all 51, 52, s3 € S; 


* is commutative if 5, * 69 = so * 81, for all 51, 59 € S. 


Next, we say that e is an identity with respect to the binary oper- 
ation * if 


exs=sxe=s forall sES. 
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If the binary operation has an identity e, then this identity is unique. 
Indeed, if e’ were another identity, then we would have 


e€ = exe = e 
— e'~—__ 
because e’ because e 
is an identity is an identity 


(Cute, huh?) 


Finally, assume that the binary operation * is associative and has 
an identity element ce. The element s’ € S is said to be an inverse of 
s € S relative to * if s’* s = sx*s' =e. Note that if s has an inverse, 
then this inverse is unique. Indeed, suppose that s’ and s” are both 
inverses of s. Watch this: 


S So ee Sse re | Ss es) ee Here S58. 
—[—_— $_ 


note how associativity is used 


EXERCISES 


1. In each case below, a binary operation * is given on the set Z of 
integers. Determine whether the operation is associative, commu- 
tative, and whether an identity exists. 


2. Give an example of three vectors u, v, and w in 3-space such that 
ux (ux w) A(wxv) x w. 


3. Let A be a set and let 24 be it power set. Determine whether the 
operations M, U, and + are associative, commutative, and whether 
an identity element exists for the operation. 
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4. Let A be a nonempty set. For any non-empty set B C A find the 
inverse of B with respect to symmetic difference +. 


5. Let Mat,,(7e) be the n xn matrices with real coefficients and define 
the binary operation * by setting 


A*B=AB~—BA, 


where A, B € Mat,,(7e). Is * associative? Commutative? Is there 
an identity? 


6. Let S be a set and let F(S) be the set of all functions f : S > S. Is 


6699 


composition “o” associative? Commutative? Is there an identity? 


7. Let S be a set and consider the set F(S, 7) of all real-valued func- 
tions f : S + FR. Define addition + and multiplication “-” on 
F(S,R) by the rules 


(f+9)(s) = f(s)+g(s), (f-9)(s) = fls)-g(s), 8 ER. 


Are these operations associative? Commutative? What about 
identities? What about inverses? 


4.2.4 The concept of a group 


Let (G, *) be a set together with a binary operation. We say that (G, *) 
is a group if the following three properties hold: 


* is associative: that is g:*(go*g3) = (g1*g2)*g3 for all g1, go, 93 € G; 


G has an identity: that is, there exists an element e € G such that 
exg=gxe=e, forallg EG; 


Existence of inverses: that is, for every g € G, there exists an 
element g’ € G with the property that g’*g=g*g' =e. 
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We have already noted on page 216 that the identity element and 
inverses are unique. This says that in denoting the inverse of an ele- 
ment g € G we may use, for example, the notation g~! to denote this 
inverse, knowing that we are unambiguously referring to a unique ele- 
ment. However, inverses (and identities) aren’t always denoted in this 
way. If we use the symbol + for our binary operation, it’s more cus- 
tomary to write “O” for the identity and to write —a for the inverse of 
the element a. Finally it’s worth mentioning that in certain contexts, 
the binary operation is simply denoted by “juxtaposition,” writing, for 
example xy in place of x * y. This happens, for instance, in denot- 
ing multiplication of complex numbers, polynomials, matrices, and is 
even used to denote the binary operation in an abstract group when no 
confusion is likely to result. 


We shall now survey some very important examples of groups. 


1. (The symmetric group) Let X be a set and let (Sym(X), 0) 
be the set of all bijections on X, with function composition as 
the binary operation. At the risk of being redundant, we shall 
carefully show that (Sym(X), 0) is a group. 


o is associative: let 01, 02, 73 € Sym(X), and let x € X. Then to 
show that a1 0 (a2 0 03) = (01 © 02) 0 03 we need to show that 
they are the same permutations on X, i.e., we must show that 
for all x € X, 0) 0 (020 03)(X) = (01 9 02) 0 03(x). But 


01 0 (92 0 03)(x) = 01((92 0 o3)()) = o1(02(03(2))), 
whereas 

(01 0 02) 0 03(%) = (01 0 02)(03(x)) = o1(02(03(2))), 
which is the same thing! Thus we have proved that © is asso- 


clative. 


Existence of identity: Let e : X — X be the function e(x) = 2, for 
all x € X. Then clearly e is a permutation, ie., e € Sym(X). 
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Furthermore, for all o € Sym(X), and for all x € X, we have 
CO.C(¢) = COCs) = oa), and-o-0 e(e) = o(e(z)). = ae), 
which proves that eog =aoe=oa. 

Existence of inverses: Let 0 € Sym(X) and let 0! : X 4 X 
denote its inverse function. Therefore o~'(x) = y means pre- 
cisely that o(y) = x from which it follows that o~+ is a permu- 
tation (i.e., 071 € Sym(X)) and (o~1o0a)(x) = x = (cog!) (2) 


for all x € X, which says that co !oo =e=ao0a!. 


I firmly believe that the vast majority of practicing group theorests 
consider the symmetric groups the most important of all groups! 


2. (The General Linear Group) Let 7 be the real number, let n 
be a positive integer, and let (GL,,(7),-) be the set of all n x n 
matrices with coefficients in ® and having non-zero determinant, 
and where - denotes matrix multiplication. (However, we have 
already noted above that we’ll often use juxtaposition to denote 
matrix multiplication.) Since matrix multiplication is associative, 
since the identity matrix has determinant 1 (4 0), and since the 
inverse of any matrix of non-zero determinant exists (and also has 
non-zero determinant) we conclude that (GL,(R),-) is a group. 
We remark here that we could substitute the coefficients R with 
other systems of coefficients, such as C or Q. (We'll look at another 
important example in Exercise 4 on page 223.) 


3. (C, +), (R, +), (Q, +), (Z, +) are all groups. 


4. Let C* = C — {0} (similarly can denote R*, Q*, Z*, etc.); then 
(C*,-) is a group. Likewise, so are (R*,-) and (Q*,-) but not 
(Z*,-). 


5. Let Z, denote the integers modulo n. Then (Z,,+) is a group 
with identity [0] (again, we’ll often just denote 0); the inverse of 
[z] is just [—a]. 


6. Let (G0) be the set of automorphisms of some graph. If X is the 
set of vertices of this graph, then G C Sym(X); by the proposition 
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on page 208, we see that G is closed under o. Since we already 
know that 0 is associative on Sym(X) it certainly continues to be 
associative for G. Next, the identity permutation of the vertices 
of the graph is clearly a graph automorphism. Finally, the same 
proposition on page 208 shows that each element g € G has as 
inverse, proving that G is a group. 


. There is one other group that is well worth mentioning, and is 


a multiplicative version of (Z,,+). We start by writing Z* = 
{1, 2, 3,...,n — 1} (note, again, that we have dispensed with 
writing the brackets ([-])). We would like to consider whether this 
is a group relative to multiplication. Consider, for example, the 
special case n = 10. Note that despite the fact that 2,5 € Zi, 
we have 2-5 = 0 ¢ Zip. In other words Zi, is not closed under 
multiplication and, hence, certainly cannot compose a group. 


The problem here is pretty simple. If the integer n is not a prime 
number, say, n = n1nN2, where 1 < n1, n2 < n then it’s clear that 
while n1, n2 € Z we have ning = 0 ¢ Z*. This says already that 
(Z*,-) is not a group. Thus, in order for (Z*,-) to have any chance 
at all of being a group, we must have that n = p, some prime 
number. Next, we shall show that if p is prime, then Z* is closed 
under multiplication. This is easy, if a, b € Z,*, then neither a nor 
b is divisible by p. But then ab is not divisible by p which means 
that ab # 0 and so, in fact, ab € Z,, proving that Z> is closed 
under multiplication. 


Next, note that since multiplication is associative in Z,, and since 
Zi, © Z, we have that multiplication is associative in Z}. Clearly 
1 € Z, and is the multiplicative identity. It remains only to show 
that every element of Z, has a multiplicative inverse. There are a 
number of ways to do this; perhaps the following argument is the 
most elementary. Fix an element a € Z, and consider the elements 


Ly 2G 8 Oy as (DL) are Z: 
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If any two of these elements are the same, say a’a = a”a, for 
distinct elements a’ and a”, then (a’ — a”)a = 0. But this would 
say that p|(a’ — a” )a; since p is prime, and since p/a, this implies 
that p|(a’—a”). But 1 <a’, a” < pand so this is impossible unless 
a’ = a", contradicting our assumption that they were distinct in 
the first place! Finally, since we now know that the elements in 
the above list are all distinct, there are exactly p—1 such elements, 
which proves already that 


AiG 2 20h Gy eoe5 (Dl) Say = aL 


In particular, it follows that 1 € {1-a, 2-a, 3-a, ...,(p—1)-a}, 
and so a’a = 1 for some a’ € Zi, proving that a’ = a‘. In short, 
we have proved that (Z;,-) is a group. 


The multiplication table for a (finite) group (G, *) is just a table 
listing all possible products.? We give the multiplication table for the 
group (Z%,-) below: 


OoOow»krk wrk 
OoOor wn Fir 
CW rR DD & bw! bv 
Ee rR OoFCN Dm WW 
WD NM OR AI - 
Oe De W OU ol 
PNW KF OLD O® 


On the basis of the above table, we find, for instance that 4~' = 2 and 
that 3-' = 5. Even more importantly, is this: if we let x = 3, we get 


which says that every element of Z> can be expressed as some 
power of the single element 3. This is both important and more 


°The multiplication table for a group is often called the Cayley table after the English mathe- 
matician Arthur Cayley (1821-1895). 
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subtle than it looks, and shall be the topic of the next subsection. 


A group (G, *) is called Abelian”? if the operation * is commutative. 
Granted, it would make good sense to call such groups “commutative,” 
but we enjoy naming concepts after influencial mathematicians. In the 
above list of groups you should be able to separate the Abelian groups 
from the non-Abelian ones. 


EXERCISES 


1. Consider the two graphs given at the beginning of this section; 
here they are again: 


Write down the elements of the corresponding automorphism groups, 
and then give the corresponding Cayley tables. 


2. In the group (Zy7,-), find 2~' and 5~!. Find any elements x such 
that 2? = 1. 


3. Let X = {1, 2, 3} and consider the group Sym(X) of permutations 
on X. Define the following two permutations: 


23 123 
o=|1lid T=11L 14 
23 1 132 


after the Norwegian mathematics Niels Henrik Abel (1802-1829) 
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27 comprise all of 


(i) Show that the six elements e, 0, 07, T, OT, 0 
the elements of this group. 
(ii) Show that o° =.7? >e-and that: ro 077: 


(iii) From the above, complete the multiplication table: 


e) € oO a T OT OT 
€ e oO or T OT o'r 
oO Oo a 

ge o T 

T T oO 
OT OT € 

o7T or 


4. Let G be the set of all 2 x 2 matrices with coefficients in Zo with de- 
terminant ¢ 0. Assuming that multiplication is associative, show 
that G is a group of order 6. Next, set 


a(t} a(t) 


1 0 


3 p2 _ 
Show that A’? = B alee 


BAB= A". 


(the identity of G), and that 


5. Let (G,*) be a group such that for every g € G, g? = e. Prove 
that G must be an Abelian group. 


6. Let Z3 be the integers modulo 3 and consider the set U of matrices 
with entries in Z3, defined by setting 


1 
Ce — 0 a, b,c € Ze 
0 


Ors 
Ore 


(a) Show that U is a group relative to ordinary matrix multipli- 
cation. 


(b) Show that |U | = 27. 
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(c) Show that for every element x € U, x* = e, where e is the 
identity of U. 


(d) Show that U is not abelian. 


7. Let (G,*) be a group such that |G| < 4. Prove that G must be 
abelian. 


8. Let (G,*) be a group such that for all a, b © G we have (ab)? = 
a*b*. Prove that G must be Abelian. Find elements a, b € Sym(X) 
in Exercise 3 above such that (ab)? 4 ab’. 


9. Let A be a set. Show that (24,+) is an Abelian group, but that 
if |A] > 2 then (24,N) and (24,U) are not groups at all. 


10. Here’s another proof of the fact that if p is prime, then every 
element a € Z) has an inverse. Given that a, p are relatively 
prime, then by the Euclidean trick (page 58) there exist integers s 
and t with sa + tp = 1. Now what? 


4.2.5 Cyclic groups 


At the end of the previous subsection we observed that the multiplica- 
tive group (Z+,-) has every element representable as a power of the 
element 3. This is a very special property, which we formalize as fol- 
lows. 


Definition of Cyclic Group. Let (G, *) be a group. If there exists an 
element g € G such that every element is a power (possibly negative) 
of x, then (G, *) is called a cyclic group, and the element z is called 
a generator of G. Note that a cyclic group is necessarily Abelian. To 
see this, assume that the group G is cyclic with generator x and that 
g,g' © G. Then g = 2” and g/ = x” for suitable powers m, n, and so 


09 = Zen" =a = ¢ = a 9 9. 
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proving that G is abelian. 
Let’s look at a few examples. 
1. The infinite additive group (Z, +) is cyclic, with generator 1. Note, 
however, in this context, we wouldn’t write 1” for powers of 1 as 


this notation is suggestive of multiplication and 1" = 1. Rather, 
in this additive setting we write 


A a EAS ee eae ds, 
ee —“__S 
n terms 


As any integer can be written as a positive or negative multiple 
(“multiple” is the additive version of “power” ), we conclude that 
(Z, +) is an (infinite) cyclic group. 


2. If n is a positive integer, then the additive group (Z,,, +) is cyclic. 
Notice here that we don’t really need any negative multiples of 1 
to obtain all of Z,. One easy way to see this is that -1 = (n—1)1 
and so if [a] € Z,, then —[a] = a(n — 1)1. 


3. If p is prime, then the multiplicative group (Zi, -) is cyclic. While 
not a deep fact, this is not easy to show using only what we’ve 
learned up to this point.1’ As examples, note that Zé is cyclic, 
with generator 24s: 2) =]=9 0? =4, 99 = s = 3 and 27 = 16:= 1. 
Next, Z* is cyclic, with generator 3, as 


31= 3 37? =9=2 3 =6, 34 =4, 3? =5, B= 1. 


Note, however, that while 2 is a generator of Z;, it is not a gener- 
ator of Z7. 


Related to the above is the following famous unsolved conjec- 
ture: that the congruence class of the integer “2” a generator of 


"Most proofs proceed along the following lines. One argues that if Z,, is not cyclic, then there will 
have to exist a proper divisor k of p— 1 such that every element x of Z5 satisfies x* = 1. However, 
this can be interpreted as a polynomial equation of degree k which has (p— 1) > k solutions. Since 
(Z,,+,-) can be shown to be a “field,” one obtains a contradiction. 
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Z,, for infinitely many primes p. This is often called the Artin 
Conjecture, and the answer is “yes” if one knows that the so- 
called Generalized Riemann Hypothesis is true! Try checking 
this out for the first few primes, noting (as above) that 2 is not a 
generator for Zz. 


4. Let n be a positive integer and consider the set of complex numbers 
CO, = {e™*/™ — cos 2nk/n+isin2ak/n| k =0,1,2,...,n—-1} CC. 


If we set ¢ = e?"/", then e2?"*/" — Ck. Since also ¢” = 1 and 
¢-1'=¢""! we see that not only is C,, closed under multiplication, 
it is in fact, a cyclic group. 


We hasten to warn the reader that in a cyclic group the generator is 
almost never unique. Indeed, the inverse of any generator is certainly 
also a generator, but there can be even more. For example, it is easy 
to check that every non-identity element of the additive cyclic (Zs, +) 
is a generator. This follows by noting that 1 is a generator and that 


On the other hand, we showed above that 3 is a generator of the cyclic 
group Z%, and since 37! = 5 (because 3-5 = 1), we see that 5 is also 
a generator. However, these can be shown to be the only generators of 
Z-. In general, if G is a cyclic group of order n, then the number of 
generators of G' is ¢(n), where, as usual, ¢ is the Euler ¢-function; see 
Exercise 6, below. 


In fact, we'll see in the next section that if (G,*) is a group of prime 
order p, then not only is G cyclic, every non-identity element of G is a 
generator. 


We shall conclude this section with a useful definition. Let (G, *) be 
a group, and let g € G. The order of g is the least positive integer n 
such that g” = e. We denote this integer by o(g). If no such integer ex- 
ists, we say that g has infinite order, and write o(g) = co. Therefore, 
for example, in the group Z?, we have 
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Note that if the element g has order n, then 
{g* | ke Z} = {e; g, g’, anes q=s 


and all of the elements of {e, g, g’, --- ,g" '} are distinct. To see 
this, note that when we divide any integer k by n we may produce a 
quotient q and a remainder r, where 0 < r < n—1. In other words 
we may express k = gn +r, which implies that g* = g"*" = gg" = 
(g")%g" = eg’ =". Therefore we already conclude that {g* | k € Z} = 
{e, 9, 97, -+-,g” 1}. Next, if e, g, g?, ---,g” 1 aren’t all distinct, then 
there must exist integers k <m, 0 <k <m<n-—1such that g* = Gg". 
But then e = g”g~* = g™—*. But clearly 0 < m—k < n—1 which 
contradicts the definition of the order of g. This proves our assertion. 


Note that in general, o(g) = 1 precisely when g = e, the identity 
element of G. Also, if G is a finite group with n elements, and if G 
has an element g of order n, then G is cyclic and g is a generator of G 
(Exercise 4). 


EXERCISES 


1. The two groups you computed in Exercise 1 of Subsection 4.2.4 
both have order 4: one is cyclic and one is not. Which one is 
cyclic? What are the generators of this group? 


2. Let G be a group and let g be an element of finite order n. Show 
that if g” =e then m must be a multiple of n, i.e., n/m. 


3. Assume that G is a group and that g € G is an element of finite 
order n. Assume that k is a positive integer which is relatively 
prime to n (see page 60). Show that the element g” also has order 
n. 
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4. Let G be a finite group of order n, and assume that G has a 
element g of order n. Show that G is a cyclic group and that g is 
a generator. 


5. Let G be a finite cyclic group of order n and assume that x is 
a generator of G. Show that |G| = o(x), ie., the order of the 
group G is the same as the order of the element z. 


6. Let G be a cyclic group of order n. Show that the number of 
generators of G is ¢(n). (Hint: let x € G be a fixed generator; 
therefore, any element of G is of the form x* for some integer 
k, 0<k<n-—1. Show that 2* is also a generator if and only if 
k and n are relatively prime.) 


4.2.6 Subgroups 


Most important groups actually appear as “subgroups” of larger groups; 
we shall try to get a glimpse of how such a relationship can be exploited. 


Definition. Let (G,*) be a group and let H C G be a subset of G. 
We say that H is a subgroup of G if 


(i) H is closed under the operation *, and 


(ii) (H, *) is also a group. 


Interestingly enough, the condition (i) above (closure) is almost 
enough to guarantee that a subset H C G is actually a subgroup. 
There are two very useful and simple criteria each of which guarantee 
that a given subset is actually a subgroup. 


Proposition. Let (G,*) be a group and let H C G be a non-empty 
subset. 


(a) If for any pair of elements h, h’ € H, h~'h’ € H, then H is a 
subgroup of G. 
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(b) If |H| < co and H is closed under x, then H is a subgroup of G. 


Proof. Notice first that we don’t have to check the associativity of x, 
as this is already inherited from the “parent” group G. Now assume 
condition (a). Since H is non-empty, we may choose an element h € H. 
By condition (a), we know that e = h~'h € H, and so H contains the 
identity element of G (which is therefore also the identity element of 
H). Next, given h € H we appeal again to condition (a) to obtain 
(since e € H) h>'=h-!xe € H. It follows that A is a subgroup of G. 

Next, assume condition (b), and let h € H. Since H is closed un- 
der *, we conclude that all of the products h, h?, h®, ... are all in H. 
Since H is a finite set, it is impossible for all of these elements to be 
distinct, meaning that there must be powers m < n with h™ = h”. 
This implies that e = h”-™ € H, forcing H to contain the identity of 
G. Furthermore, the same equation above shows that e = h"""~! * h, 
where n—m—1 > 0. Therefore, h”-”~! € H ande = h"-™—!xh implies 
that h>! = h"-"-! € H. Therefore, we have shown that H contains 
both the identity and the inverses of all of its elements, forcing H again 
to be a subgroup of G. 


I can’t overestate how useful the above result is! 


One very easy way to obtain a subgroup of a given group (G, *) is 
start with an element x € G and form the set H = {x*| k € Z}; that 
is, H contains all the positive and negative powers of x. Clearly H 
satisfies condition (a) of the above proposition since 2”, 2” € H > 
xe" = 2" " € H. Therefore H is a subgroup of G; as H is cyclic, 
we say that H is the cyclic subgroup of G generated by x. This 


cyclic subgroup generated by x is often denoted (x). 


EXERCISES 


1. Let G be a group, and let H, and Hy be subgroups. Prove that 
the intersection H, Ho is also a subgroup of G. 


2. Let G be a group, and let H; and H»2 be subgroups. Prove that 
unless H, C Hy or Hy C Ay, then H,U H2 is not a subgroup of G. 
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. Show that the even integers 2Z is a subgroup of the additive group 


of the integers (Z, +). In fact, show that ifn is any positive integer, 
then the set nZ of multiples of n is a subgroup of (Z, +). 


. Show that any subgroup H of the additive group (Z,+) of the 


integers must be cyclic. 


. Show that any subgroup H ¥ {0} of the additive group (C, +) of 


complex numbers must be infinite. 


. Consider the group G = GLo(R) of 2 x 2 matrices of non-zero 


determinant. Find an element (i.e., a matrix) A of finite order 
and an element B of infinite order. Conclude that G has both 
finite and infinite subgroups. 


. Let X = {1, 2, 3, 4} and set G = Sym(X), the group of permuta- 


tions of X. Find all of the elements in G having order 2. Find all 
of the elements of G having order 3. Find all of the elements of G 
having order 4. 


. Let (G,*) be a cyclic group and let H C G, H # {e} be a sub- 


group. Show that H is also cyclic. (This is not entirely trivial! 
Here’s a hint as to how to proceed. Let G have generator x and 
let n be the smallest positive integer such that 7” € H. Show 
that, in fact, 7” is a generator of H.) 


. Consider the set #* of positive real numbers and note that (R*, -) 


is a group, where “-” denotes ordinary multiplication. Show that 
R* has elements of finite order as well as elements of infinite order 
and hence has both finite and infinite subgroups. 


Consider a graph with set X of vertices, and let G be the auto- 
morphism group of this graph. Now fix a vertex x € X and set 
G, ={0 €G| o(x) =z}. Prove that G, is a subgroup of G, often 
called the stabilizer in G of the vertex 2. 


Find the orders of each of the elements in the cyclic group (Zy2, +). 


Let p be a prime number and let Z, be the integers modulo three 
and consider the group GL2(Z,) of matrices having entries in Z; 
and all having nonzero determinant. 
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(a) Show that GLo(Z,) is a group. 
(b) Show that there are p(p + 1)(p — 1)? elements in this group.” 
(c) Let B be the set of upper triangular matrices inside GLo(Z,). 


Therefore, 
a b 
rea) 


Show that B is a subgroup of GLo(Z,), and show that |B | = 
pp =), 

(d) Define U C B to consist of matrices with 1s on the diagonal. 
Show that U is a subgroup of B and consists of p elements. 


ac # of C GL,(Z,). 


4.2.7 Lagrange’s theorem 


In this subsection we shall show a potentially surprising fact, namely 
that if H is a subgroup of the finite group G, then the order |H| evenly 
divides the order |G| of G. This severly restricts the nature of subgroups 


of G. 


The fundamental idea rests on an equivalence relation in the given 
group, relative to a subgroup. ‘This relationship is very similar to 
the congruence relation ( mod n) on the additive group Z of integers. 
Thus, let (G,*) be a group and let H C G be a subgroup. Define a 
relation on G, denoted ( mod #7) defined by stipulating that 


g= g(mod H) &g"'o' EH. 


This is easy to show is an equivalence relation: 


: F : . b : 
This takes a little work. However, notice that a matrix of the form : d will have nonzero 


determinant precisely when not both a and 6 are 0 and when the “vector” (c,d) is not a multiple 
of the “vector” (a,b). This implies that there are p? — 1 possibilities for the first row of the matrix 
and p? — p possibilities for the second row. Now put this together! 
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reflexivity: g = g( mod #) since g-'g=e € H. 


symmetry: If g = g'( mod #) then g™'g’ € HA, and so gi 'g = 
(g-‘g')"' € H. Therefore, also g' = g( mod H) 


transitivity: If g = g'( mod #) and g’ = g’( mod H), then g™'q’, 
g 'g" € H. But then g~'g” = g-!q'q'"'q" € H, proving that also 
g = g'( mod #). 


Pretty easy, eh? 


As a result we see that G is partitioned into mutually disjoint equiva- 
lence classes. Next we shall actually determine what these equivalence 
classes look like. Thus let g € G and let [g] be the equivalence class 
(relative to the above equivalence relation) containing g. 


We Claim: [g] = gH = {gh| he H}. 


Proof of Claim: Note first that an element of gH looks like gh, for 
some h € H. Since g-!(gh) = h € H we see that g = gh( mod H), 
ie., gh € |g]. This proves that gH C [g]. Conversely, assume that 
g= g'( mod #),i-e., that g = g'( mod H), which says that g~'g! € H. 
But then g~'g' = h for some g! = gh € gH. This proves that [g] C gH 
and so |g] = gH. 


Next we would like to show that H is a finite subgroup of G then the 
elements of each equivalence class gH, g € G have the same number of 
elements. In fact, we shall show that |gH| = |H]|, for each g € G. To 
prove this we shall define a mapping f : H — gH and show that it is 
a bijection. Namely, we define f(h) = gh, h € H. 


f is one-to-one: If h, h’ € H and if f(h) = f(h’), then gh = gh’. We 
now multiply each side by g~! and get h = g-'gh = g-'gh! =h’. 
Thus f is one-to-one. 


f is onto: If gh € gH, then gh = f(h) and so f is onto. 


It follows, therefore, that |gH| = |H| for each element g € G. If G 
is also a finite group, this says that G is partitioned into sets, each of 
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which has cardinality |H|. If G is partitioned into k such sets, then 
obviously |G| = k|H|, which proves that |H]| is a divisor of the group 
order G. 


We summarize the above in the following theorem. 


Lagrange’s Theorem. Let G be a finite group and let H be a sub- 
group of G. Then |H|||G}. 


If G is a finite group and g € G, then we have seen that o(g) is the 
order of the subgroup (g) that it generates. Therefore, 


Corollary. If G is a finite group and g € G, then o(g)||GI. 


Example. Consider the graph given to 1 2 
the right, with four vertices, and let G 
be the automorphism group of this graph. 
Notice that if X = {1, 2, 3, 4}, then G is j 5 


a subgroup of Sym(X), the group of per- 
mutations of of the four vertices. There- 
fore, we infer immediately that |G] is a 
divisor of 4! = 24. 


Note that two very obvious automorphisms of this graph are the 
permutations 


LD. decd 1234 
o:|/Li bd], 75) Lb 1 4 
23 Ao De A AB 


Next, note that o has order 4 and 7 has order 2. Finally, note that 
tot = 0° (= o 1). Let C = (oc) and set D = (r) be the cyclic 
subgroups generated by o and rT. Note that since 7 ¢ C’, we conclude 
that |G| > 4 = |C]; since by Langrange’s Theorem we must have 
|C|||G|, we must have that |G] is a multiple of 4 and is strictly larger 
than 4. Therefore |G| > 8. Also since G is a subgroup of Sym(X), we 
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see that |G ||24. But there are plenty of permutations of the vertices 
1, 2, 3, and 4 which are not graph automorphisms (find one!), and so 


IG| < 24. 


On the other hand, note that the powers e, 7, 0”, 0° give four au- 
tomorphisms of the graph, and the elements 7, 07, 077, o°7 give four 
more. Furthermore, since tot = o? we can show that the set 
{e, 0, 07, 0°, T, oT, 077, oT} is closed under multiplication and hence 
is a subgroup of G. Therefore 8||G| and so it follows that |G| = 8 and 
the above set is all of G: 

2 


G = feo, 07,67, Foto To Th: 


Below is the multiplication table for G (you can fill in the missing 
elements). Notice that G has quite a few subgroups—you should be 
able to find them all (Exercise 3). 


1e) € oO or Oo T OT o'r o'r 
€ € oO oe o? T OT o'r o'r 
oO oO o ok e 

oe a a € oO 

a o € oO o 

T T 

OT | OT 

or 

o°r 


EXERCISES 


1. Use the corollary on page 233 to give another proof of Fermat’s 
Little Theorem; see page 86. 


2. Suppose that G is a finite group of prime order p. Prove that G 
must be cyclic. 


3. Refer to the multiplication table above for the group G of symme- 
tries of the square and list all of the subgroups. 
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4. Let G be a group, let H be a subgroup, and recall the equivalence 
relation ( mod H#) defined by 


g=g (mod H) 6g'q EH. 


The equivalence classes in G relative to this equivalence relation 
are called the (left) cosets of H in G. Are the cosets also sub- 
groups of G? Why or why not? 


5. Let G be the group of Exercise 3 and let K be the cyclic subgroup 
generated by or. Compute the left cosets of K in G. 


6. Let G be the group of Exercise 3 and let L be the subgroup 
{e, T, 07, o°r}. Compute the left cosets of L in G. 


7. Here we shall give yet another proof of the infinitude of primes. 
Define, for each prime p the corresponding Mersenne number by 
setting M, = 2? — 1 (these are often primes themselves). Assume 
by contradiction that there are only finitely many primes and let 
p be the largest prime. Let q be a prime divisor of M, = 2? — 1. 
Then we have, in the multiplicative group Z, of nonzero integers 
modulo q, that 2? = 1(mod q). This says, by exercise 2 on page 227 
that p is the order of 2 in the group Zj. Apply Lagrange’s theorem 
to obtain p|(q— 1), proving in particular that q is a larger prime 
than p, a contradiction. 


4.2.8 Homomorphisms and isomorphisms 


What is the difference between the additive group (Z¢, +) and the mul- 
tiplicative group (Z:,-)? After all, they are both cyclic: (Z¢,+) has 
generator 1 (actially, [1]), and (Z#,-) has generater 3 ((3]). So wouldn’t 
it be more sensible to regard these two groups as algebraically the same, 
the only differences being purely cosmetic? Indeed, doesn’t any cyclic 
group of order 6 look like {e, x, x, 2°, x*, 2°, 2°}? 
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Here’s a much less obvious example. Consider the two infinite groups 
(R,+) and (R*,-). At first blush these would seem quite different. 
Yet, if we consider the mapping f : R — R* given by f(x) = e” (the 
exponential function) then f is not only a bijection (having inverse In) 
but this mapping actually matches up the two binary operations: 


fay): Se Se see = Fy) ey) 


Notice that the inverse mapping g(x) = Inz, does the same, but in the 
reverse order: 


g(a-y) = In(w-y) = Inga+ny = g(x) +g(y). 


The point of the above is that through the mappings f and its inverse 
g we see that group structure of (R*,-) is faithfully represented by the 
group structure of (R*,-), i.e., the two groups are “isomorphic.” We 
shall formalize this concept below. 


Definition of Homomorphism: Let (G, *) and (H,x) be groups, and 
let f : G > H be a mapping. We say that f is a homomorphism 
if for all g, g’ € G we have f(g xg’) = f(g) x f(g’). In other words, 
in finding the image of the product of elements g, g’ € G, it doesn’t 
matter whether you first compute the product g * g’ in G and then 
apply f or to first apply f to g and g’ and then compute the product 


Fg) * f(g!) in A. 


Of course, we now see that the exponential mapping from (R,+) to 
(R*,-) is a homomorphism. 


Here’s another example. Recall the group GL2(R) of 2 x 2 matrices 
having real coefficients and non-zero determinants. Since we know that 
det(A - B) = det(A) - det(B) we see that det : GLo(R) > R* is a 
homomorphism, where R* denotes the multiplicative group of non-zero 
real numbers. 


Definition of Isomorphism: If f : G > H is a homomorphism of 
groups (G,*) and (H,x), we say that f is an isomorphism if f is 
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bijective. Note that in this case, the inverse mapping f-': H > G is 
also a homomorphism. The argument is as follows: if h, h’ € H then 
watch this: 


FF (h) « F(R) 


since f isa =o 


F(F (A) & FF) aaa 


since f and f ! 
/ * 
hxh are inverse func- 
tions 


= f(f-'(hxh’)) (same reason!) 


However, since f is one-to-one, we infer from the above that 
f(b) * F(R) = f (heh), 
i.e., that f~! is a homomorphism. 


Before going any further, a few comments about homomorphisms 
are needed here. Namely, let G and G2 be groups (we don’t need 
to emphasize the operations here), and assume that e; and e2 are the 
identity elements of G; and Gp, respectively Assume that f : G,; > Go 
is a homomorphism. Then, 


f(e1) =e2. This is because f(e1)? = f(e1)f(e1) = f(eie1) = f (ex). 
Now multiply both sides by f(e,)~! and get f(e,) = eo. 


If x € Gy, then f(x~') = f(x)~+. Note that by what we just proved, 
e9 = f(e1) = f(xx!) = f(x) f(a~!). Now multiply both sides by 
f(xy and get = f(a)" = f(a) es = 
f(a) "(Ff @)F(@*)) = (F@)f(@)Y)F(@*) = ef (a) = f(a"). 


Theorem. Let (G,*) and (H,x) be cyclic groups of the same order n. 
Then (G, *) and (H, x) are isomorphic. 


Proof. Let G have generator x and let H have generator y. Define 
the mapping f : G > H by setting f(a") = 2*, k=0, 1, 2,...,n—1. 
Note that f is obviously onto. But since |G] = |H| = n it is obvious 
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that f is also one-to-one. (Is this obvious?) Finally, let x*, 2! € G; 
ifk+i<n—1, then f(2*z') = fie") =a" = yy! = Fe) f (2). 
However, if k +1 > n, then we need to divide n into k +1 and get a 
remainder r, where 0 <r <n—1, say k+1=qn+r, where q is the 
quotient and r is the remainder. We therefore have that 


egz') (since x" = eg, the identity of G) 


= (by definition of f) 


r 


= ey (where ey is the identity of H) 


= yy" (since y™ = ex) 


= qn+r 
= 


EXERCISES 


1. Let f : G; + G2 be a homomorphism of groups, and let Hy C G, 
be a subgroup of G;. Prove that the image, f(H1) C G2 is a 
subgroup of Go. 


2. (A little harder) Let f : G; + G2 be a homomorphism of groups, 
and let Hy C Gs be a subgroup of Gy. Set f-'(H2) = {gq € 
G1| f(g1) € Hz}. Prove that f~!(H) is a subgroup of G. 


3. Let GL2(R) be the group of 2 x 2 matrices with real coefficients 
and determinant not 0, and let S be a nonsingular matrix. Define 
the mapping f : GLo(R) > GL2(R) by setting f(A) = SAS. 
Prove that f is an isomorphism of GL2(7) onto itself. 


4. Again, let GLo(7) be the group of 2 x 2 matrices with real coeffi- 
cients and determinant not 0. Show that the determinant defines 
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10. 


I, 


a homomorphism of GL2(7) into the multiplicative group of non- 
zero real numbers. 


. (Really the same as Exercise 3) Let G be any group and fix an 


element « € G. Prove that the mapping f : G > G defined by 
setting f(g) = xgx~' is an isomorphism of G onto itself. 


. Let A be an Abelian group and let f : A > B be a surjective 


homomorphism, where B is also a group. Prove that B is also 
Abelian. 


. Let f :G — H be a homomorphism of groups and set K = {g € 


G| f(g) = ex}, where ey is the identity of H. Prove that K isa 
subgroup of G.!8 


. Let X = {1, 2, 3, ..., n}, where n is a positive integer. Recall that 


we have the group (2*, +), where, as usual, 2* is the power set of 
X and + is symmetric difference (see page 194). Define f : 2% > 
{—1, 1} (where {+1} is a group with respect to multiplication) by 
setting 


+1 if |A| is even 
#(A) = ee 
—1_ if |A| is odd. 


Prove that f is a homomorphism. 


. Let G be a group and define f : G > G by setting f(g) = g7. 


(a) Show that f is a bijection. 
(b) Under what circumstances is f a homomorphism? 


Prove that the automorphism groups of the graphs on page 207 
(each having four vertices) are not isomorphic. 


Let R be the additive group of real numbers and assume that 
f :R- Ris a function which satisfies f(a — y) = f(x) — f(y), for 
O0<a,y€R. Prove that f : R > R is a homomorphism. 


13This subgroup of G is usually called the kernel of the homomorphism f. 
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12. Let R be the additive group of real numbers and assume that 
f :R-—- Risa function which satisfies the peculiar property that 


f(x’? —y’) = xf (x) — yf(y) for allz, y ER. 
(a) Prove that f : R — R is a homomorphism, and that 
(b) there exists a real number c € R such that f(z) = cr, c ER. 


The result of Exercise 12 is strictly stronger that that of Exer- 
cise 11. Indeed the condition of Exercise 12 shows that the homo- 
morphism is continuous and of a special form. We’ll return to 
this briefly in Chapter 5; see Exercise 6 on page 252. 


13. Let G be a group and let C* denote the multiplicative group of 
complex numbers. By a (linear) character we mean a homomor- 


phism y : G > C*, ie., x(gige) = x(91)x(g2) for all gm, go € G. 


(a) Prove that if y : G — C* is a character, then y(g~!) = x(g) 
(complex conjugate) for all g € G. 


Now assume that G is finite and that y : G > C* is a character 
such that for at least one g € G, x(g) #1. Prove that 
(b) S x(g) = 9. 
gEG 
(c) Let x1, x2 : G > C* be distinct characters and prove that 
d= x1(9)x2(g) = 0. 
gEG 


(d) Fix the positive integer n and show that for any integer k = 
0,1,2,...,2—1 the mapping y; : Z, > C* given by x;z(a) = 
cos(2rka/n) + isin(2rtka/n) is a character. Show that any 
character of Z, must be of the form yz, 0 < k <n, as above. 


4.2.9 Return to the motivation 


We return to the two graphs having six vertices each on page 206, and 
make a simple observation about their automorphism groups. Namely, 
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for any two vertices, call them x; and 2x9, there is always an automor- 
phism o which carries one to the other. This is certainly a property of 
the graph—such graphs are called vertex-transitive graphs. A simple 
example of a non vertex-transitive graph is as follows: 


1 2 
Since vertex 1 is contained in three 
edges, and since vertex 2 is contained 
in only two, it is obviously impossible 
to construct an automorphism which 
will map vertex 1 to vertex 2. A 3 
Graph A 


The graph to the right doesn’t 
have the same deficiency as the 
above graph, and yet a moment’s 
thought will reveal that there 
cannot exist an automorphism 


which carries vertex 1 to vertex 3) 
2. Graph B 


The following result is fundamental to the computation of the order 
of the automorphism group of a vertex-transitive graph. Its usefulness 
is that it reduces the computation of the size of the automorphism group 
of a vertex-transitive graph to the computation of a stabilizer (which 
is often much easier). 


Theorem. Let G be the automorphism group of a vertex-transitive 
graph having vertex set X. Fix x € X and let G, be the stabilizer in 
G of x. Then |G| = |X|-|G,|. 


Proof. Let H be the stabilizer in G of the fixed vertex x: H = {o € 
G| o(a) = x}. Recall the equivalence relation on G introducted in 
Subsection 4.2.7, namely 


M47 have used this result many times in my own research! 
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og =o'(modH) +>0'o' EH. 


Recall also from Subsection 4.2.7 that the equivalence classes each have 
order |H]|; if we can count the number of equivalence classes in G, we’ll 
be done! Now define a mapping f : G > X by the rule f(o) = 
a(x). Since the graph is vertex-transitive, we see that this mapping is 
surjective. Note that 


ttt? 


In other words, there are exactly as many equivalence classes mod H 
as there are vertices of X! This proves the theorem. 


We turn now to the computation of the size of the automorphism 
groups of the two graphs introduced at the begining of this section. In 
order to compute the order of a stabilizer, we re-draw the graph from 
the “point of view” of a particular vertex. Thus, consider the following 
graphs, where the second emphasizes the role of the vertex 1: 
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If H is the stabilizer of the vertex 1, then surely H must permute the 
three vertices 2, 4, 6 and must permute the vertices 3 and 5. Further- 
more, it is easy to see that any permutation of 2, 4, 6 and of 3, 5 will 
determine an automorphism of the graph which fixes vertex 1. Since 
there are 6 = 3! permutations of 2, 4, and 6, and since there are 2 
permutations of 3 and 5, we conclude that there are exactly 6 x 2 = 12 
automorphisms which fix the vertex 1. Therefore the full automorphism 
has order 6 x 12 = 72. 


We turn now to the second graph considered in our introduction; 
again we draw two versions: 


1 1 
6, 2 5 ° 3 
+4 
5. °3 
y 6. 2 
Graph D 


If H is the stabilizer of the vertex 1, then H must permute the three 
vertices 3, 4, 5 and must permute the vertices 2 and 6. However, 
in this case, there are some restrictions. Note that H must actually 
fix the vertex 4 (because there’s an edge joining 3 and 5). Thus an 
automorphism 7 € H can only either fix the vertices 3 and 5 or can 
transpose them: 7(3) = 5, 7(5) = 3. However, once we know what 7 
does to {3, 4, 5} we can determine its effect on 2 and 6. If 7 fixes 3 and 
5, then it’s easy to see that 7 also fixes 2 and 6 (verify this!). Likewise, 
if 7 transposes 3 and 5, then 7 also transposes 2 and 6, meaning that 
there are only two elements in H, e and the element 
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From the above we conclude that the automorphism group of this graph 
has 6x2 = 12 elements, meaning that the first graph is six times more 
symmetrical than the second! 


EXERCISES 


1. Compute |G|, where G is the automorphism group of Graph A, 
above. Is G abelian? 


2. Compute |G], where G is the automorphism group of Graph B, 
above. Is G abelian? 


3. Find an element of order 6 in the stabilizer of vertex 1 of Graph 
C, above. 


4. As a challenge, compute the order of the automorphism group of 
the Petersen graph.! 


Petersen graph : 


15The answer is 120=5!. Indeed, the automorphism group is isomorphic with the symmetric group 
Ss. Here’s a possible approach. Construct the graph [’ whose vertices are the 2-element subsets of 
{1,2,3,4,5} and where we stipulate that A and B are adjacent precisely when AN B = 9. One 
shows first that this graph is actually isomorphic with the Petersen graph. (Just draw a picture!) 
Next, if we let S:; operate on the elements of {1,2,3,4,5} in the natural way, then Ss actually acts 
as a group of automorphisms of I. 


Chapter 5 


Series and Differential Equations 


The methods and results of this chapter pave the road to the students’ 
more serious study of “mathematical analysis,” that branch of mathe- 
matics which includes calculus and differential equations. It is assumed 
that the student has had a backgroud in calculus at least equivalent 
with that represented either in IB mathemtics HL year 2 or AP Cal- 
culus (AB). The key ideas revolving around limits will be reviewed, 
leading to substantial coverage of series and differential equations. 


5.1 Quick Survey of Limits 


As quickly becomes obvious to even the causual learner, the study of 
calculus rests in a fundamental way on the notion of limit. Thus, a 
reasonable starting point in this somewhat more “advanced” study is 
to be reminded of the notion of the “limit of a function as x approaches 
a (either of which might be +00).” 


5.1.1 Basic definitions 


DEFINITION. Let f be a function defined in a neighborhood of the real 
number a (except possibly at x =a). We say that the limit of f(x) is 
L as x approaches a, and write 


lim f(z) = L, 

ra 
if for any real number € > 0, there is another real number 6 > 0 (which 
in general depends on €) such that whenever 0 < |x — a| < 6 then 
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If(a) — Ll <e. 


Notice that in the above definition we stipulate 0 < |x —a| < 6 rather 


than 


just saying |2 —a| < 6 because we really don’t care what happens 


when x = a. 


In defining limits involving oo only slight modifications are necessary. 


DEFINITION. 


Limits at oo. Let f be a function defined for all x > N. We say that 


We say that the limit of f(x) is L as x approaches oo, and write 


jim f(z) = L, 


if for any real number € > O, there is another real number K 
(which in general depends on €) such that whenever x > K then 
|f(x) — L| <e. 


In an entirely similarly way, we may define lim a) oS ae 


Limits of oo. Let f be a function defined in a neighborhood of the real 


number a. We say that the limit of f(x) is L as x approaches oo, 
and write 


lim f(x) = oo, 


if for any real number N, there is another real number 6 > 0 (which 
in general depends on N) such that whenever 0 < |x —a| < 6 then 
If(a)] > N. 


Similarly, one defines 


lim f(z) = —W, jim f(z) = ©, 


and so on. 


Occasionally, we need to consider one-sided limits, defined as fol- 


lows. 
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DEFINITION. Let f be a function defined on an interval of the form 
a<a«a<b. We say that the limit of f(x) is L as x approaches a 
from the right, and write 


lim f(z) = L, 


aat 
if for any real number € > 0, there is another real number 6 > 0 (which 
in general depends on €) such that whenever 0 < x —a < 6 then 
If(@) — LE] <e. 


Similarly, one defines lim f(x) = L. 
La 


Limits behave in a very reasonable manner, as indicated in the fol- 
lowing theorem. 


THEOREM. Let f and g be functions defined in a punctured neighbor- 
hood of a.! Assume that 


Then, 


lim(f(z)+9(z)) = L+M, and lim f(x)g(x) = LM. 


PROOF. Assume that 6 > 0 has been chosen so as to guarantee that 
whenever 0 < |x — a| < 6, then 

€ € 

f(a) - LE] <5, and |g(a)— M| <5, 


Then, 


f(x) + 9(2) — (L+ M)| <|f(@) - E+ |g(e)- MI <5 +5 =6 


proving that lim(f(z) + g(z)) = L+M. 


1A “punctured” neighborhood of the real number a is simply a subset of the form {x € R|0 < 
|x — a| < d, for some positive real number d. 
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Next, assume that 1 > € > 0, and let 6 > 0 be a real number such 
that whenever 0 < |x — a] < 4, 


€ € € € 
ee ae pe 


(If either of L, M = 0, ignore the corresponding fraction 3)’ au)) 


If(x)g(e) - LM] = |(f(a) — L)(g(@) -— M) + (f(@) - £)M 
+(g(a) — M)L| 
< |(f(@) — £)(g(@) — M+ |(f@) — £)M| 
Tig) —M)L| 
Se eae 
< statg=é 


proving that lim f(z)g(z) = LM. 


As indicated above, in computing lim f(x) we are not concerned 
with f(a); in fact a need not even be in the domain of f. However, in 
defining continuity at a point, we require more: 


DEFINITION. Let f be defined in a neighborhood of a. We say that f 
is continuous at x = a if 


lim f(z) = f(a) 


As a simply corollary of the above theorem we may conclude that 
polynomial functions are everywhere continuous. 


The student will recall that the derivative of a function is defined 
in terms of a limit. We recall this important concept here. 


2Note that in the above proof we have repeatedly used the so-called “triangle inequality,” which 
states that for real numbers a, b € R, |a+ b| < |a| + |b]. A moment’s thought reveals that this is 
pretty obvious! 


SECTION 5.1 QUICK SURVEY OF LIMITS 249 


DEFINITION. Let f be a function defined in a neighborhood of a. If 


fan £0) — F(a) 


ra C—-a 


we say that f is differentiable at x = a and write f'(a) = L, calling 
f'(a) the derivative of f at a. 


In mathematical analysis we often encounter the notion of a se- 
quence, which is nothing more than a function 


PS HO ce ee 


It is customary to write the individual terms of a sequence 
f(0), f(1), f(2),... as subscripted quantities, say, as ao, a1, @2,.... 


Sequences may or may not have limits. 


DEFINITION. Let (@n)n>0 be a sequence. We say that the limit of the 
sequence is the real number L € R and write lima, = L, if given 
€ > 0 there exists a real number N such that whenever n > N then 
lan — L| <e. 


We shall begin a systematic study of sequences (and “series” ) in the 
next section. 


Finally, we would like to give one more example of a limiting process: 
that associated with the “Riemann integral.” Here we have a function 
f defined on the closed interval [a,b], and a partition P of the interval 
into n subintervals 


Po C= Boi Se oe ee a 


On each subinterval [x;_1, x;] let 


M;=_ max f(z), m= 


Uj. LK ji 


f(a). 


min 
Uj. LK Ij 


The upper Riemann sum relative to the above partition is the 
sum 
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OEP) = > M,(a; — 211), 


4=1 


and the lower Riemann sum relative to the above partition is the 
sum 


L(f; P) = Smile — £j-1). 


Before continuing, we need two more fundamental concepts, the 
least upper bound and greatest lower bound of a set of real num- 
bers. Namely, if A C FR, we set 


LUB(A) = min{d >al|a€ A},GLB(A) = max{d <0) |0 SxAy,, 


Finally, we define the sets 


U(f) = {U(f; P)|P is a partition of [a, 5]}, 
L(f) = {L(f; P)|P is a partition of [a, b]}. 


DEFINITION. If LUB(L(f)) and GLB(U(f)) both exist, and if 
LUB(L(f)) = GLB(U(f)), we say that f is Riemann integrable 
over [a,b] and call the common value the Riemann integral of f 
over the interval |a, }]. 


EXAMPLE. Consider the function f(x) = 2°, 0 < x < 2, and con- 
sider the partition of [0,2] into n equally-spaced subintervals. Thus, if 


Pes: Oj 26.< 2 St ey SZ is. this partitnon, then.¢;= 2 — 
0,1,2,...,n. Since f is increasing over this interval, we see that the 


maximim of f over each subinterval occurs at the right endpoint and 
that the minimum occurs at the left endpoint. It follows, therefore, 
that 
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Next, one knows that 5° i? = 4n7(n + 1)’; therefore, 
i=l 


4n?(n + 1)? 


4n?(n — 1)? 
n! , 


U(fiP) = , LP) 


Finally, we note that for any partition P’ of [0,2] 0 < L(f;P’) < 
U(f; P’) < 16, and so it is clear that GLB(U(f)) and LUB(L(f)) both 
exist and that GLB(U(f)) > LUB(Z(f)). Finally, for the partition P 
above we have 


L(f; P) < LUB(L(f)) < GLB(U(f)) < U(F; P). 


Therefore, we have 
4= lim L(f;P) < LUB(L(f)) < GLB(U(f)) < lim L(f;P) =4, 


and so it follows that 
2 
i adx = 4. 
0 
For completeness’ sake, we present the following fundamental result 
without proof. 


THEOREM. (Fundamental Theorem of Calculus) Assume that we are 
given the function f defined on the interval [a,b]. If there exists a 
differentiable function F’ also defined on [a,b] and such that F'(x) = 
f(x) for all x € [a,b], then f is Riemann integral on |a, 6] and that 


[ fede = FQ) - FO). 


EXERCISES 
1. Let f and g be functions such that 


(a) f is defined in a punctured neighborhood of a, 
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(b) Titties af (a) css dy: 
(c) g is defined in a punctured neighborhood of L, and 
(qd), Linnie 35 gta) = AME 


Show that lim g(f(z)) = M. 


. Show that if a > 0, then lim /x = Va. 
. Show that if a € R, then lim V1+2?= v1+a?. 


. Prove that the sequence 1, 0, 1, 0, ... does not have a limit. 


. Let f: D> FR be a real-valued function, where D (the domain) 


is some subset of R. Prove that f is continuous at a € D if and 


only if for every sequence aj, a2, ..., which converges to a, then 
tim, f (an) = f(a). 


. Here we revisit Exercises 11 and 12 on page 239. 


(a) Let f : R > FR be a differentiable homomorphism, i.e., f is 
differentiable and satisfies f(~+y) = f(x)+ f(y) for all x, y € 
R. Prove that there exists c € R such that f(x) = cr, x ER. 
(This is easy!) 

(b) Let f : R — R be a continuous homomorphism, and prove 
that the same conclusion of part (a) holds. (Hint: you want to 
prove that for all a, x € R, f(ax) = af(x). This guarantees 
that. f(z) = zf(1).)° 


. Define f(x) = 2 Vx t fot---, © >0. Clearly f(x) =0. Is 


f continuous at x = 0? (Hint: note that if we set y = f(x), then 
y= /e+y, and so y? =2+y. Using the quadratic formula you 
can solve for y in terms of x.) 


. Recall Euler’s ¢-function ¢, given on page 63. Define the set A of 


real numbers 


Aus (einen. 


3It may surprise you to learn that “most” homomorphisms R + R are not continuous. However, 
the discontinuous homomorphisms are essentially impossible to write down. (Their existence involves 
a bit of advanced mathematics: using a “Hamel basis” for the real numbers, one can easily write 
down discontinuous homomorphisms R — R.) 
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Show that LUB(A) = 1 and GLB(A) = 0.4 


9. (The irrationality of 7.) This exercise will guide you through a 
proof of the irrationality of 7 and follows roughly the proof of Ivan 
Niven’. Assume, by way of contradiction, that we may express 
= = where a and 6 are positive integers. For each integer n > 1 
define the function 

x"(a — bx)” 


n! 


a 
Using the assumption that 7 = b? show that 


f(0) is an integer for all i > 0; 
() (a) is an integer for all 7 > 0 (use (b)). 


) 

b) fn(@ — ©) = fn(@); 
) 
) 


Next, define the new functions 


Fra() = fe) — FO (a) + fFO(@) — + FO" @) 


4Showing that LUB(A) = 1 is pretty easy: define P = { 2) | p is prime } and show that 
LUB(P) = 1. Showing that GLB(A) = 0 is a bit trickier. Try this: let py = 2,p2 = 3,p3 = 5,---, 
and so px is the k-th prime. Note that 


6! Dp 1 1 1 
sont) (0- 3) (-3)(-$). 
P1p2°** Pr P1 p2 Pr 
(pipe - ++ Pr) 
Pip2°** Pr 


The trick is to show that —> 0asr— oo. Next, one has the “harmonic series” 


co 


1 
(see page 265) y —, which is shown on page 265 to be infinite. However, from the Fundamental 
n 
n=1 


Theorem of Arithmetic (see page 76), one has 
1 ie, “i Def a 1 1 

= 1 IS dies shes Sf eet EE SY cae ack 

eso ara Cree) 

ie he ie ies 

= {1 1 1 wes 
(ao) es) 

O(pip2 ++: Pr) 


P1p2°** Pr 
>BuLL. AMER. MATH. Soc. Volume 53, Number 6 (1947), 509. 


pan 


From this one concludes that, as stated above, >Oasr>o. 
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= (-1'f(e), n>, 


i=0 
and show that 


e) F,,(0) and F;,(7) are both integers; 


soe ) + F(x) = f(x) (note that if i > 2n, then f(x) = 0); 
(g) TR )sing — F,,(x) cosa] = f(x)sing; 

(h) iy fnr(x) sin x dx is an integer for all n > 1. 

Finally, show that 


(i) f,(z)sinz > 0 when 0 < x <7; 


ry 


(j) fa(z) sing < ~* 


when 0< 2 <7; 


Conclude from (j) that 


n+lqn 


k) ih PS) SNe e <= s for alln > 1. 


grtlan 


Why are (h) and (k) incompatible? (Note that lim a 0.) 


5.1.2 Improper integrals 
There are two type of improper intergrals of concern to us. 


(I.) Those having at least one infinite limit of integration, such as 


[ f(a) ae or [ f(a) ae 


(II.) Those for which the integrand becomes unbounded within the 
interval over which the integral is computed. Examples of these 
include 


i a (p > 0), id 


SECTION 5.1 QUICK SURVEY OF LIMITS 255 


The definitions of these improper integrals are in terms of limits. 
For example 


[ f@ae = lim [ f(a)az 


b-00 
ove) . 0 ; b 
[. fe) de = Jim, | f(w)de + lim | F(a) ax: 
Likewise, for example, 
' ae == ii Ve ae 
0 7P a7>0t+ Ja PP 


8. de : a dx : 3 dx 
i = iim + lim | 
ee ee G2 a =D” hor lbs Qe 
Relative to the above definition, the following is easy. 


THEOREM. We have 


PROOF. We have, if p £ 1, that 


co dx _  gitP ja ; qi-P 1 
| a im = lim 
if 
— ifp>l 
=. gl 
co.) 6 Oifp<l1. 
If p = 1, then 
oo @ 

| ee lim lage): =. Tim lina =" ‘Ga: 

1 v a—oo 1 a—-oo 


EXAMPLE. Compute the improper integral po , where p > 1. 


xv In? x 
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This is a fairly simple integration: using the substitution (u = In x) one 
first computes the indefinite integral 


| dr 1 
cin’x = (1—p)aln? x 


Therefore, 

- dx sepiies 1 ee 1 

2 gin?e @\ (1—p)aln?z/] |. 2(1—p)ln?2’ 
EXERCISES 


1. For each improper integral below, compute its value (which might 
be +00) or determine that the integral does not exist. 


(®) fr v2 


0) fi v= 
Oi 


e2 —A 


d) [incar 


oo 2 
2. Let k > 1 and p > 1 and prove that | sin’ — dx < oo. (Hint: 
Z 
2 54 Z 
note that if 2? > 4 then sin* (= < sin (=) < =) 
zP zP P 
3. Compute 


lim [et cos(at) dt. 
L—> OO 0 


4. Let A, B be constants, A > 0. Show that 


oo B 
AP = 
| e *’sin Bi dt = Dap TB 


5. Define the function 


1 if lz)<2 
te) = | 1 ics: 


0 otherwise. 
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Now let f be a continuous function defined for all real numbers 
and compute 


. Ll fo r—a 
py a") soe 


in terms of f and a. 


6. (The (real) Laplace transform) Let f = f(a) be a function 
defined for x > 0. Define a new function F' = Fs), called the 
Laplace transform of f by setting F'(s) = iE e ™ f(x) dx, where 
s > 0. Now let f be the function defined by 


Lat Soe 
(x) = 
OQ. ata > 1, 


Compute the Laplace transform F’ = F's) explicitly as a function 
of s. 


7. Let f(x) = sin2rz, x > 0. Compute the Laplace transform F' = 
F(s) explicitly as a function of s. (You'll need to do integration 
by parts twice!) 


5.1.3. Indeterminate forms and l’HoOpital’s rule 


6 


Most interesting limits—such as those defining the derivative—are “in- 
. 

determinate” in the sense that they are of the form lim A(e) where the 
cw 7a v 


numerator and denominator both tend to 0 (or to oo). Students learn 
to compute the derivatives of trigonometric functions only after they 
have been shown that the limit 
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tan x 
At the same time, you’ll no doubt remem- 
ber that the computation of this limit sin x 


was geometrical in nature and involved 
an analysis of the diagram to the right. 


The above limit is called a 0/0 indeterminate form because the 
limits of both the numerator and denominator are 0. 
You’ve seen many others; here are two more: 


Der = Tees .. £1 
lim and lim ; 
r—3 xr —3 rol ry —1 


Note that in both cases the limits of the numerator and denominator 
are both 0. Thus, these limits, too, are 0/0 indeterminate forms. 

While the above limits can be computed using purely algebraic meth- 
ods, there is an alternative—and often quicker—method that can be 
used when algebra is combined with a little differential calculus. 


In general, a 0/0 indeterminate form is a limit of the form lim fa 
where both lim f(z) = 0 and lim g(x) = 0. Assume, in addition, that 
f and g are both differentiable and that f’ and g’ are both continuous 


at x =a (a very reasonable assumption, indeed!). Then we have 


a oe (by continuity of the derivatives) 
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This result we summarize as 


VHopital’s Rule. Let f and g be functions differentiable on some 
interval containing x = a, and assume that f’ and g’ are continuous at 
x =a. Then 


jim £62) — bat) 
#4 g(r) limg’(a) | 


As a simple illustration, watch this: 


fee Se a = 
x23 74 — 83 lim 1 
r33 


which agrees with the answer obtained algebraically. 


In a similar manner, one defines co/oo indeterminate forms; these 
are treated as above, namely by differentiating numerator and denom- 
inator: 


l’H6pital’s Rule (co/oo). Let f and g be functions differentiable on 
some interval containing x = a, that lim f(x) = oo = lim g(x), and 
assume that f’ and g’ are continuous at x = a. Then 


There are other indeterminate forms as well: 0-00, 1%, and oo”. 
These can be treated as indicated in the examples below. 


EXAMPLE 1. Compute jm x’ Ina. Note that this is a 0 - oo indeter- 


minate form. It can saa ne converted to an 2 indeterminate form 


and handled as above: 


l 1 ag? 
lim v?7Inz = lim eet ae im Beka = lini we he 0. 
r—0+ xr—0t (lia) r—0t —2/23 ro0t 2 
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Other indeterminate forms can be treated as in the following exam- 


ples. 


4 
EXAMPLE 2. Compute pas [lias 


. Here, if we set L equal to this 


limit (if it exists!), then we have, by continuity of the logarithm, that 


Ini = 


In lim (1 — -) 

4 pee A@, ©) v 
lim In (1 — -) 
lim x ln (1 — ) 


ma- 4) 


Lye 


4/ (2* (1-5) 


—1/x? 


lim 
LOO 


im 
LOO 


Jie, Ga =~ 


x 


This says that In L = —4 which implies that L = e+ 


EXAMPLE 3. This time, try ; 


lim (cos@)°". The same trick applied 
(1 /2)- 


above works here as well. Setting L to be this limit, we have 


It follows that ; 


lim (cos0@ 
> (n/ ae ) 


Ine = 


In lim (cos 6)°°4 


6 (m/2)- 


lim ye 0 


In(cos 0 


0 (a /2)- 


lim cos@Incosé 


6—>(m/2)- 


lim 
0 (a/2)- 


In cos 8 

1/ cos 6 
tand 

secOtan@ — 


eye 


cos 8 | 
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EXERCISES 


1. Using l’Hopital’s rule if necessary, compute the limits indicated 


below: 
(a) li zc —1 “oo — oo” indeterminate 
Magee a form to one of the forms dis- 
(b) lim cos(7x 2D cussed above! ) 
zl (x — 1) (i) Jim (In 2a — In(a + 1)). 
(©) im e410 G) Jim (145) 
. sin 30 Be 
(d) jim 40 (k) lim yn (1+ =] 
sin @? 1/(«—1) 
paeiee 1) ima 
(c) lim = Oe 
1—si m) jim re * 
ie im) 
67/2 1 + cos 20 (iy dnmegrer <a Sa0) 
_ In(a +1) oy 
(g) Jim ‘alos (0) jim Inz In(1 — =) 
(h) lim, (In a — Insinx) (Hint: (p) ie Inzln(1 — x) (Are (0) 
~—> 


you need to convert this andp ) really different’) 


2. Compute f° te 7” dx. 


3. Let n be a non-negative integer. Using mathematical induction, 
show that i) Des area \. 


4. (The (real) Gamma function) Let z > 0 and define [(z) = 
r a*te-* dx. Show that 


(a) [(n) = (n — 1)! for any positive integer n; 


(b) ['(z) exists (i.e., the improper integral converges) for all z > 0. 


5. (Convolution) Given functions f and g defined for all x € FR, the 
convolution of f and g is defined by 


fxg(x) = | fg(e-t)d 
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provided the improper integral exists. Assuming that f * g(x) 
exists for all x, show that “x” is commutative, i.e., that 

f*gt) Hoe 7 es. Joralle. 


We shall meet the convolution again in our study of statistics 
(page 370). 


. Let a > 0 be a constant, and set 


ee afeSo 
0 if a0); 


Show that if g(x) is defined for all x € R, then 


PaaS eo ‘ie g(t)e" dx 


provided the improper integral exists. 


Now compute f * g(x), where g is as above and where 


(a) f(x) = sin bx, where b > 0 is a constant. 


(b) f(z) = 2°. 
L ata G 

C), Fn 

te) fla) ifa <0. 
inbx ifa>0 

(a) fa) aca ie: where b > 0 is a constant. 
0 a0 


. (Convolution and the Low-Pass Filter) In electrical engineering 


one frequently has occasion to study the RC low-pass filter, whose 
schematic diagram is shown below. This is a “series” circuit with 
a resistor having resistance R Q (“ohms”) and a capacitor with 
capacitance C' F (“Farads”). An input voltage of x(t) volts is 
applied at the input terminals and the voltage y(t) volts is observed 
at the output. The variable ¢t represents time, measured in seconds. 
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An important theorem of 
electrical engineering is that R 
if the input voltage is x(t), 


then the output voltage is y = L (t) c 
xxh(t), where h (the “impulse 
response” ) is given explicitly 


by 


tel" ift >0 
0 ift <0, 
and where 7 = RC. 


Now assume that R = 1000 and that C = 2uF (= 2 x 10~° 
Farads). Let 


sin2aft if t >0 
x(t) = 
0 ift <0 


where f > 0 is the frequency of the signal (in “hertz” (Hz) or units 
of (sec)~!). In each case below, compute and graph the output 
voltage y(t) as a function of time: 


(a) f = 100Hz 
(b) -f =2 kiz, or 2000-Hz 
(c) f = 100 kHz 


8. (For the courageous student!®) Consider the function 


wf 


and set F(x) = [f@ dt. Show that F’(0) = 0. (Hint: try 
carrying out the following steps: 


®T am indebted to Robert Burckel for suggesting this problem. 
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(a) If G(x) = f " f(t) dt, then F’(0) = G’(0). 
(b) Show that G is an odd function, i.e., G(—x) = —G(z). 
(c) Use integration by parts to show that if 0 < y < x, then 
z x t? sin(1/t) dt 
tiat = =e aa 
[ f@a = f 


y i 
= x*cos(1/x) — y* cos(1/y) + i 2db < 8a". 
y 
(d) Using part (c), show that for all x, |G(x)| < 32. 
(e) Conclude from part (d) that G’(0) = 0.) 


5.2 Numerical Series 


Way back in Algebra IT you learned that certain infinite series not 
only made sense, you could actually compute them. The primary (if not 
the only!) examples you learned were the infinite geometric series; 
one such example might have been 


3.8 3 


Furthemore, you even learned how to compute such infinite geometric 
series; in the above example since the first term is a = 3 and since 
the ratio is r = i you quickly compute the sum: 


a 3 
a a 


00 3 
3 see — — 
eee Or 1 


Perhaps unfortunately, most infinite series are not geometric but 
rather come in a variety of forms. [’ll give two below; they seem similar 
but really exhibit very different behaviors: 


1 
Series 1: leas Eg aie ope 
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‘ ; L. b.d 
Series 2: Leg ag gg 
2y 2y 
1 1 
x | x 
1 2 3 4 5 1 2 3 4 5 


To see how the above two series differ, we shall consider the above 
diagrams. The picture on the left shows that the area represented by 
the sum 1 + 7 + 5 +--- is greater than the area under the curve with 
equation y = 1/x from 1 to oo. Since this area is 


oo dx = 
| — = Ing =o0, 
1 x 1 
we see that the infinite series 1 + 5 =F 3 +--+ must diverge (to infin- 


ity). This divergent series is often called the harmonic series. (This 
terminology is justified by Exercise 20 on page 109.) Likewise, we see 


that the series 1 + + Pp +--- can be represented by an area that 
dx i 

is << 1+ i —+ =1--—=| = 2, which shows that this series cannot 
tL 2 eau 


diverge to oo and so converges to some number.’ 


5.2.1 Convergence/divergence of non-negative term series 


Series 2 in the above discussion illustrates an important principle of 
the real numbers. Namely, if ao, a1, a2, ... is a Sequence of real num- 
bers such that 


(1) G9 Ss or Se Sis 


., and 


(ii) there is an upper bound M for each element of the sequence, 
i.€., Gn < M for each n = 0, 1, 2, ..., 


2 
T 
"It turns out that this series converges to ra this is not particularly easy to show. 
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then the sequence converges to some limit L (which we might not be 
able to compute!): lim a, = L. 


Figure 1 


So what do sequences have to do with infinite series? Well, this 


CO 
is simple: if each term a, in the infinite series 4° a, is non-negative, 


n=0 
then the sequence of partial sums satisfies 
k 
OS OF O, S doo. Se SS Day Sa 
n=0 


Furthermore, if we can establish that for some M each partial sum 
k 
Si = Y5 ay satisfies S, < M then we have a limit, say, jim oy eae Oe el 
00 


n=0 
which case we write 


WG a GE 
n=0 


CO 
In order to test a given infinite series > a, of non-negative terms for 


n=0 
convergence, we need to keep in mind the following three basic facts. 


Fact 1: In order for 5 a, to converge it must happen that im, = 


0. (Think shout ie if the individual terms of the series don’t get 
small, there’s no hope that the series can converge. Furthermore, 
this fact remains true even when not all of the terms of the series 
are non-negative. ) 
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Fact 2: If Fact 1 is true, we still need to show that there is some 
k 


number M such that 4) a, < M for all k. 
n=0 


Fact 3: Even when we have verified Facts 1 and 2, we still might 
not (and usually won’t) know the actual limit of the infinite series 


CO 
been 
n=0 


Warning about Fact 1: the requirement that lima, = 0 isa 

necessary but not sufficient condition for convergence. Indeed, in 
| oan | 

the above we saw that the series }/ — diverges but that }/ —, con- 
n 


verges. 


EXERCISES 


1. Apply Fact 1 above to determine those series which definitely will 
not converge. 


EL EL 0 (-1)"n? 
Ratt Rin =O Lae 
wo Fr eee on) E = 

n=0 nr n=1 n=0 

come hile) ee a ee Inn 
(c) a (f) XV) sin n (i) »y me £1) 7 


2. Occassionally an infinite series can be computed by using a partial 
1 


fraction decomposition. For example, note that = 


n(n+1) on 
1 
and so 
n+1 
Maes 1 al 1 
Jan) - Urey, 


Ghee Seou 


Such a series is called a “telescoping series” because of all the 
internal cancellations. Use the above idea to compute 
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me 1 Be 3 
a ——_—_ b 
el a (D) 2s oye gna 
. Consider the series 


il 
2S {= | the integer n doesn’t contain the digit 0| 


Therefore, the series © contains reciprocals of integers, except that, 
for example, 10 is thrown out, as is 20, as is 100, 101, etc. No Os 
are allowed! Determine whether this series converges. (Hint: 


1 1 
pa ee — tae | Pe 
Z 9 
: ; : : 
11 12 IO 2k 99 
| 1 1 | | iL 
111-112 999 
+ 
9? 98 
< 94 ) 
10. ~—-100 


1 nr 
. (Formal definition of e) Consider the sequence a, = (1 + - | —e— 
n 


it ae 


(a) Use the binomial theorem to show that a, < Gni1, n = 
1,2,.... (Note that in the expansions of a, and ay41, the 
latter has one additional term. Moreover, the terms of a, can 
be made to correspond to terms of a,4; with each of the terms 
of the latter being larger.) 


1 
(b) Show that, for each positive n, a, < 14+1+—+—+--: 
2! 3! n! 


n 


1 
(c) Conclude that lim. (1 + - | exists. The limit is the familiar 


natural exponential base, e, and is often taken as the formal 
definition. 


(d) Show that for any real number z, lim (1 + =) =e”. (Hint: 


. rt n . MX 
note that lim (1 + =) = lim (1 oe —) .) 
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n 


5. Prove that the limit lim | >) — — Inn } exists; its limit is called 
k=1 

Euler’s constant® and is denoted y (¥ 0.577). To prove this just 

draw a picture, observing that the sequence ; eo —- lng <1 


k=1 
n 


for all n and that the sequence a, = 5> — — Inn is an decreasing 
k=1 


sequence.” 


5.2.2 Tests for convergence of non-negative term series 


In this subsection we'll gather together a few handy tests for conver- 
gence (or divergence). They are pretty intuitive, but still require prac- 
tice. 


The Limit Comparison Test 


(i) Let >° a, be a convergent series of positve terms and let S> b, 
n=0 n= 
be a second series of positive terms. If for some R, 0 < R < co 


then >~ b, also converges. (This is reasonable as it says that 
n=0 


Sor sometimes the Euler-Mascheroni constant 
°Drawing a picture shows that 


1 1 1 1 1/1 1 1/1 1 1 1 1 1 1 
I+a+s4+:--4 Inn>-4 joa z 
2 3 n—-1 4 2\2 38 2\3 4 2\n-1 on 2 2n 
Therefore, 
bl 1 1 1 1 
14 pert Inn>-=4 Da: 
2 3 n 2 2n 2 


Next, that the sequence is decreasing follows from the simple observation that for all n > 0, 
1 (“ +1 ) 
<lIn ; 
n+1 n 
Finally, I’d like to mention in passing that, unlike the famous mathematical constants 7 and e 


(which are not only irrational but actually transcendental), it is not even known whether 7¥ is rational 
or irrational. 
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asymptotically the series )~ b, is no larger than R times the con- 
n=0 


CO 
vergent series >> dy.) 
n=0 


(ii) Let 5° a, be a divergent series of positve terms and let > b, be 


n=0 n=0 
a second series of positive terms. If for some R, 0 < R < oo 


lim Pn =e ahh 


N—->OCo An 


then >= b, also diverges. (This is reasonable as it says that 
n=0 


asymptotically the series S> b, is at least R times the divergent 


n=0 


CO 
series )> dy.) 
n=0 


Let’s look at a few examples! Before going into these examples, note 
that we may use the facts that 


> — diverges and >) -, converges. 
n=1 10 n=1 1 


EXAMPLE 1. The series >> na 


—,———— converges. We test this against 
9 2n* —N+ 2 


cca 
the convergent series }/ —,. Indeed, 
n=2 10 


lim 
n—- oo 


) 


1 
(sa—<s5) ag 
1 ae 


n2 


(after some work), proving convergence. 


00 1 
EXAMPLE 2. The series —_— diverges, as 
yi V/n+1l 6 
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oo i 
showing that the terms of the series ——— are asymptoticall 
8 ep 


CO 


i 
much bigger than the terms of the already divergent series }> —. There- 
n=1 1 


00 1 
fore, by the Limit Comparison Test, ——— diverges. 
, : Xu Jn+1 6 


2 
CA Fane 3 
EXAMPLE 3. The series }> 99 «Conver ges. We compare it 
n=1 n 
conga 
with the convergent series }/ —,: 
n=1 
n?+2n+3 
i nel? m+ in+3 | 
a 1 n->0o nt/2 0, 
(7) 


proving convergence. 


ed 1 
EXAMPLE 4. The series }> any? diverges. Watch this: 
n=2 VON 
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n=00 ( I = wo (Inn)? 
n 
VHopital |. dat 


VHopital . ae 


This says that, asymptotically, the series }> is infinitely larger 


1 
=) (Inn)? 
than the divergent harmonic series > — implying divergence. 

n=2 10 
The next test will provide us with a rich assortment of series to test 
with. (So far, we’ve only been testing against the convergent series 


1 
3 —, and the divergent series ee :) 
n=1 a n=1 


The p-Series Test}. Let p be a real number. Then 


> 1 converges if p> 1 
n=1 1” | diverges if (ieee 
The p-series test is sometimes called the p-test for short; the proof 


2O-.. i 
of the above is ioe just as we proved that S> > ee by 
a 


n=1 


comparing it with Ie (which converges) and that oe diverged by 
n=l a 


comparing with ia Poa (which diverges), we see that > - will have 
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oo dx 
the same behavior as the improper integral 3 = But, where p F 1, 
& 


we have 
ie dx _ gi-P 
1 @P a 1-— p 


oo 1 
We already know that 5° — diverges, so we’re done! 
n=1 1 


p-1 
Co ip, 


co fe ifp>1 
1 


The p-Test works very well in conjunction with the Limit Com- 
parison Test. The following two examples might help. 


2 
on’ +2n4+3 i 
EXAMPLE 5. 5~ — aa: Conver ges. We compare it with the series 
n=1 n 
Ge 
S> 3/9) which, by the p-test converges: 
n=1 1 
n?+2n4+3 
; ni _ W+2n4+3 
lim = lim ——,. = 1, 
N00 1 n-Co n2 
& ) 


proving convergence. 


2 

St PLZ SO , eee ’ 

EXAMPLE 6. >> a diverges. We compare it with the series 
n=1 nm 

S> , which, by the p-test diverges: 

n=1 


1 
yn 


proving divergence. 


There is one more very useful test, one which works particularly 
well with expressions containing exponentials and/or factorials. This 
method is based on the fact that if |r| < 1, then the infinite geometric 
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a 
). In this test we do not 
=p 


need to assume that the series consists only of non-negative terms. 


series a+ ar + ar? +--+ converges (to 1 


The Ratio Test | Let 5° a, be an infinite series. Assume that 
n=0 


: An+1 
lim =f. 
noo a 
n 


Then 
(i) if |R| <1, then S> a, converges; 


n=0 
(ii) if|R| > 1, then S° a, diverges; 
n=0 


(iii) if |R| = 1, then this test is inconclusive. 


The reasoning behind the above is simple. First of all, in case (i) we 


see that 5° a, is asymptotically a geometric series with ratio |R| < 1 


n=0 
and hence converges (but we still probably won’t know what the series 


converges to). In case (ii) then 5° a, will diverge since asymptotically 
n=0 
each term is R times the previous one, which certainly implies that 


lim an # 0, preventing convergence. Note that in the two cases yo - 
noo an 


An+1 2 , i bf A 
= 1,!° which is why this case is inclusive. 


cha 
and >) — we have lim 
n=1 N—-OoO 


We turn again to some examples. 


; _ & (n+1) 
EXAMPLE 7. Consider the series }> ~———~-. We have 
n=1 n! 
10Indeed, we have in the first case 
at) 
lim 41 = Jim A@t¥4 = lim =a 
N—0o An n—0o 1) n ont+l1 
in the first case, and that 
asl wr 
lim “"+1 — tim (ce) Ss Aine Sf 
N—0o An n—0o (4 n>co nt 1 


in the second case, despite the fact that the first series diverges and the second series converges. 
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(228 
l An+1 =. in (n+1)! 
noo q,, n-0o (int 
(n+ 2)8 
=; 
wiacoch (ea ecole 
ioe) 1 3 
Therefore, >> Meas converges. 
n=l1 nN: 
oo 72 
EXAMPLE 8. Consider the series 5° an We have 
(n+1)? 
: An+1 < ( grtl ) 
lim —— = lim ——.— 
n—co q,, NO (=) 
1? 4 
= lim gl) ee ee | 
n—-0o 3n2 3 
2 72 
It follows, therefore, that >> a also converges. 
n=1 
oo | 
EXAMPLE 9. Consider the series }> oa We have 
(n+1)! 
~  An+1 F ( 2eti ) 
eg aaa 
On (Fr) 
bo Mee 
= lim = OO, 
N—->0o 2 
which certainly proves divergence. 
EXERCISES 
1. Test each of the series below for convergence. 
n+ 2 © n?—n+2 


OD 2 ore ib) 2s 


nap nA +n? +1 
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. How about S> 
n=1 
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n? + 2n oar nk 
g a 
n=0 or ( ) n=1 2" 
00 (n a Iyge h oo 1 
te) 2 n! ( )d (1++)" 


pe i 
. As we have already seen, the series 5° —;, converges. In fact, it is 
n 


n=1 
2 


ome 
known that >> >S= a a sketch of Euler’s proof is given on page 
n=1 1 


228 of the Haese-Harris textbook.'!! Using this fact, argue that 


oo 1 7? 
X Qn+ 1? 8) 


. Prove the sinusoidal p-series test, namely that 


oe 27\ |converges if p> 1, 
nP 


sin 
x diverges ip <A, 
(Of course, the 27 can be replaced by any constant. Exercise 2 on 
page 256 is, of course, relevant here!) 


. Recall Euler’s ¢-function ¢ (see page 63). Determine the behavior 


1 
of the series 
2 nan) 
n) 


of 


n2 


. (See Exercise 16d on page 64.) 


a 


. Let Fo, Fi, Fo,... be the terms of the Fibonacci sequence (see 


eo Lin 
page 93). Show that \~ an converges and compute this sum ex- 


n=0 
plicitly. (Hint: you’ll probably need to work through Exercise 7 
on page 106 first.) 


Peter Blythe, Peter Joseph, Paul Urban, David Martin, Robert Haese, and Michael Haese, 
MATHEMATICS FOR THE INTERNATIONAL STUDENT; MATHEMATICS HL (OprTions), Haese and 
Harris Publications, 2005, Adelaide, ISBN 1 876543 33 7 
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7. As in the above sequence, Let Fo, Fi, Fo, ... be the terms of the 


comme | iP i) 
Fibonacci sequence. Show that >> i < a, where a = = 
k=0 Lk 
(the golden ratio). (Hint: show that if k > 2, then Fy, > a*!. 


and then use the ratio test.) 


8. Consider the generalized Fibonacci sequence (see Exercise 9 on 
page 106) defined by up = uy = 1 and Unig = Uny1 + Un. Show 
that if a, b are such that u, + 0 as n > oo, then > uy converges 


n=0 
and compute this sum in terms of a and 6. 


5.2.3. Conditional and absolute convergence; alternating se- 
ries 


CO 
In this subsection we shall consider series of the form => a, where 
n=0 
the individual terms a, are not necessarily non-negative. We shall 
ioe) 


first make the following useful definition. An infinite series 4° a, is 
n=0 


called absolutely convergent if the series }> | a,| converges. This is 
n=0 
important because of the following result. 


Theorem. If the series )> a, is absolutely convergent, then it is con- 
n=0 
vergent. 


Proof. Note that we clearly have 


O< Gn -F Gn] S 2lenl on HU, Qiks 


Since 5° 2|a,,| converges, so does 5° (a, + |an|); call the limit LZ. There- 
n=0 n=0 

fore; SS ty = b= 
n=0 


= n= 


CO 
|a,|, proving that 5° a, converges, as well. 
0 n=0 


The above says, of course, that the infinite series of the reciprocals of the Fibonacci numbers 
converges. Its value is known to be an irrational number * 3.35988566... 


278 CHAPTER 5 SERIES AND DIFFERENTIAL EQUATIONS 


We consider a couple of simple illustrations of the above theorem. 


= oll 


EXAMPLE 1. The series > will converge by the above theorem, 


together with the p- Test. 


lee) —] n 
EXAMPLE 2. The series 5> EE does not converge absolutely; how- 


n 
ever, we'll see below that this series does converge. 


An infinite series 5° a, which converges but is not absolutely con- 


n=0 
vergent is called conditionally convergent. There are plenty of con- 
ditionally convergent series, as guaranteed by the following theorem. 


Theorem. (Alternating Series Test) Let aj) > a, >a, >-:: > 0 


and satisfy lim a, = 0. Then the “alternating series” }/(—1)"dn 
n=0 


converges.® 


We'll conclude this section with two illustrations of the Alternating 
Series Test. 


sae 
EXAMPLE 3. We know that the harmonic series 5° — diverges; how- 


n=1 1 
ever, since the terms of this series decrease and tend to zero, the Al- 


ternating Series Test guarantees that S—> co ae converges. We'll 
n 


n=1 


show later on that this actually converges to In2 (see page 302). 


n 
EXAMPLE 4. The series Sera can be shown to diverge by applying 
on? 


Seal 
the Limit Comparison “Test with a comparison with the harmonic 


13The proof of this is pretty simple. First of all, note that the “even” partial sums satisfy 


(ao — a1) < (ao — a1) + (a2 — ag) < (ao — a1) + (a2 — ag) + (a4 — a5) <-> 


so it suffices to show that these are all bounded by some number (see Figure 1, page 266). However, 
note that 


ag — ((a1 — a2) + (a3 — aa) + (a5 — ag) + +++ + (dan—3 — Gan—2)) — dan—1 < ao, 
ee ~ 
This is positive! 


so we’re done. 
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series (do this!). However, the terms decrease and tend to zero and so 


co (-1)"n 


by the Alternating Series Test 
y S xu n? + 1 


converges. 


EXERCISES 


1. Test each of the series below for convergence. 


@ Saye ©) Syn 
ow) So © Sasay 
Ores” Ceo 

LCM Tae ED 2 al 
o > € : 2) th) 2 cL 


2. Determine whether each of the series above converges condition- 
ally, converges absolutely or diverges. 


oo sin x 
3. Prove that the series the improper integral / —— dx converges. !4 
ao 


4. Prove that the improper integral i * cos a2 dx converges.!° (Hint: 
try the substitution u = x? and see if you can apply the Alternating 
Series Test.) 


5. Consider the infinite series SF —, where each e€, is +1. Show that 


any real number x, —2 < ed z 2 can be represented by such a 
series by considering the steps below: 


(a) Write © = 5, = U+— 4, where X, is the sum of the 


positive terms in © and where S_ is —(negative terms in ©). 


M4Tn fact, it converges to 7. 


5 1 / 
15This can be shown to converge to 3 
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(See the footnote.’®) By thinking in terms of binary decimal 
representations (See, e.g., Exercise 4 on page 92) argue that 
any real number y with |y| < 2 can be represented in the form 
Dy. 


(b) If U4 = y, show that 44 —U_ = 2(y—1). 
(c) Conclude that any number x between —2 and 2 can be repre- 
sented as )4—_ and hence as the infinite series }> 7 (This 
n=0 


result is not true if the series is replaced by, say, one of the 


form 5° While the values of such a series would always 
n=0 


lie between —3 and 3, and despite the fact that uncountably 
many such numbers would occur, the set of such numbers is 


still very small.!” 


5.2.4 The Dirichlet test for convergence (optional discussion) 


There is a very convenient test which can be thought of as a gener- 
alization of the alternating series test and often applies very nicely to 


testing for conditional convergence. For example, we may wish to test 


oo cosn 


the series 5° 
n=1 


is absolutely convergent!®, nor is it an alternating series, so none of the 
methods presented thus far apply. 


for convergence. It is not clear whether this series 


Let us first consider the following very useful lemma. 


LEMMA. Let (a,,) and (b,,) be two sequences and set 8, = >~ ax. Then 
k=1 
one has 


16There is an important result being used here, namely that if > un is an absolutely convergent 
series, then its sum is unaffected by any rearrangement of its terms. 


17The smallness of this set is best expressed by saying that it has “measure zero.” Alternatively, if 


we were to select a real number randomly from the interval [-2, 3] , then the probability of selecting 


lo) 
€ 
a number of the form S> oe is zero. Finally, if instead of allowing the numerators €, to be +1 we 


n=0 
insisted that they be either 0 or 2, then what results is the so-called Cantor Ternary Set (which 
also has measure zero). 
18Tt’s not, but this takes some work to show. 
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De p0n =" SU na Soe Oa). 
es Rat 


PROOF. Setting s9 = 0 we obviously have a, = $s, — 5,1, k > 1. From 
this, one has 


>> andy = Y5 (Se — Sp_1) de 
b= f= 
= DO spb_y — D> S—1be 
fal f=1 


= Snbn41+ > 8K (dk — be 41). 
y=1 


DIRICHLET’S THEOREM FOR CONVERGENCE. Let (a,) and (b,) be 
n 


dan 


k=1 


two sequences of real numbers. If the partial sums are all 


bounded by some constant M, and if 
6b) > bg > b3 > +++ > 0 with jim bn = 90, 


CO 
then the series S~ a,b; converges. 
k=1 


PROOF. Setting s, = 5° a, and r, = >> azby we have from the above 
k=1 k=1 
lemma that 


n 
Pn Tm = Sanaa —_ Somat a Sr (dp => b+1), 
k=m-+1 


where n > m are positive indices. Taking absolute values and applying 
the Triangle inequality provides 


In a | = |Sn/bn41 a Soma =I » [sx] (Ox = be41) 
k=m-+1 
< Mbniit Momiit Mo >> (by — be41) 


k=m-+1 
al 2M Ora . 
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Since b, — 0 as k — oo, we conclude that (r,,) is a Cauchy sequence 
of real numbers, hence converges. 


As a corollary to the above, let’s consider the convergence of the 


2 cosn 
sequence alluded to above, viz., > ——. To do this, we start with the 


n=1 


fact that for all integers k, 


2sin(1/2) cosk = sin(k + 1/2) — sin(k — 1/2). 


Therefore, 


\2 sin(1/2)| 


n 
S> cosk 
k=l 


> (sin(k + 1/2) — sin(k — 1/2)) | 


sin(n + 1/2) — sin(1 /2) 
<q. 2. 


Since sin(1/2) 4 0 we already see that 5) cosk is bounded, and Dirich- 
k=1 
, ,  cosn 
let’s theorem applies, proving the convergence of > 
n=1 1 


EXERCISES 


1. Strenghten the result proved above by proving that the series 
© COsSnx 


converges whenever p > 0, and 2 is not an integral 
n=1 
multiple of 27. 


sin nx 
nP 


2. Prove that S° converges whenever p > 0. 
n=1 


5.3 The Concept of a Power Series 


Let us return to the familiar geometric series, with ratio r satisfying 
Fie coal 
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atar+tar+-:.. = 


Lge 


Let’s make a minor cosmetic change: rather than writing r in the 
above sum, we shall write x: 


ataxtax? +--+. = eae hee 
ee 


In other words, if we set 


f(z) = atar+azr*+--- = Yaz” andset g(x) = ; 
n=0 


then the following facts emerge: 


(a) The domain of f is —1 < x < 1, and the domain of g is x #1. 


(b) f(x) = g(x) for all x in the interval -1 <a <1. 


CO 
We say, therefore, that S~ ax” is the power series representation 


n=0 
of g(x), valid on the interval —1 < x <1. 


So what is a power series anyway? Well, it’s just an expression of 
the form 


CO 
sae 
n=0 


where ao, Q1, @2, ... are just real constants. For any particular value of 
x this infinite sum may or may not converge; we'll have plenty to 
say about issues of convergence. 
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5.3.1 Radius and interval of convergence 


Our primary tool in determining the convergence properies of @ a power 


series > a,x” will be the Ratio Test. Recall that the series 3 |anx”| 
=0 n=0 
will converge if 


which means that 


> a,x" is absolutely convergent for all x satisfying |a| < lim 
An+1 


n=0 ed: 


is sometimes called the radius of con- 
An+1 
CO 


The quantity R = lim 


NOOO 


vergence of the power series 4° a,x". Again, as long as —R <a < R, 
n=0 


we are guaranteed that S> a,x" is absolutely convergent and hence 


n=0 
convergent. 


A few simple examples should be instructive. 


CO —| Na~pn 
EXAMPLE 1. The power series 5> —— has radius of convergence 
n=0 470 


This means that the above power series has radius of convergence 1 and 
so the series is absolutely convergent for -1 < x < 1. 


n 


CO 
EXAMPLE 2. The power series }> 


n=0 


has radius of convergence 
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a qe a as 
= Peal eee (ety Mad T 


so in this case the radius of convergence is 2, which guarantees that the 
power series converges for all x satisfying —2 < x < 2. 


co (—])%yn 
EXAMPLE 3. Consider the power series 5° elke 
—(0 nN: 


n= 
radius of convergence is similarly computed: 


. In this case the 


LS <li 


NOOO 


= itil, G20), 
noo 
An+1 


oT a Ne 
This infinite radius of convergence means that the power series 5° ae 
n=0 nN: 


actually converges for all real numbers z. 


EXAMPLE 4. We consider here the series 5° , which has radius 


oo (@ +2)" 
n=0 n2” 
of convergence 


il gn+l 
Pin eg CEE 3 
This means that the series will converge where |x + 2] < 2, i.e., where 


—4<2<0. 


nae” 


Qn 


. The radius 


CO 
EXAMPLE 5. Here we consider the power series }> 
n=0 


of convergence is 


gnvrl 
Pei = 
n= 9" 2(n + 1) 


But this is a power series in x? and so will converge if x? < 2. This 
gives convergence on the interval —/2 < x < V2. 
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In the examples above we computed intervals within which we are 
guaranteed convergence of the power series. Next, note that for values 
of x outside the radius of convergence we cannot have convergence, 
for then the limit of the ratios will be greater than 1, preventing the 
individual terms from approaching 0. This raises the question of con- 
vergence at the endpoints. We shall investigate this in the examples 
already considered above. 


co (—])ryr 
EXAMPLE 6. We have seen that the power series )> (=) has 


n=0 Bese: 
radius of convergence R = 1 meaning that this series will converge in 


the interval —1 < x < 1. What if x = —1? What if 7 = 1? Well, 
we can take these up separately, using the methods of the previous two 
subsections. If « = —1, we have the series 
("yy _ 
ys ak oe oe ee 


n=O: <2 Pa 7 foo 2n +1 


which diverges (use the Limit Comparison Test against the har- 
monic series). If x = 1, then the series becomes 

2 (-1)" 

n=0 2n + 1? 


which converges by the Alternating Series Test. We therefore know 
the full story and can state the interval of convergence: 


has interval of convergence —1<a< 1. 


cae 
EXAMPLE 7. We saw that the power series 5> Za has radius of con- 


n=0 
vergence r = 2. What about the behavior of the series when 7 = +2. 
If « = —2 we have the series 


foe) _9)\n 60 
X = = Fee which diverges, 


whereas when x = 2, we have 
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oo m2” oo 
oe S> n_ which also diverges. 
n=0 n=0 
Therefore, 
Se 0s tee ; 
» 5 has interval of convergence —2< 2 <2. 
n=0 


co (—])ryr 
EXAMPLE 8. The power series 5° ezles 

n=0 nm: 
vergence so there are no endpoints to check. 


has infinite radius of con- 


Before closing this section, we should mention that not all power se- 


ries are of the form 5° a,x"; they may appear in a “translated format,” 
n=0 


CO 
say, one like S> a,(a — a)", where a is a constant. For example, con- 


sider the series a Example 4, on page 285. What would the interval of 
convergence look here? We already saw that this series was guaranteed 
to converge on the interval —4 < x < 0. If x = —4, then this series 
is the convergent alternating harmonic series. If x = 0, then the series 
becomes the divergent harmonic series. Summarizing, the interval of 
convergence is —4 <x < 0. 


EXERCISES 


1. Determine the radius of convergence of each of the power series 


below: 
Se co (—1)"(4 + 2)” 
Cea wes af 
(») n=o0 2” ; (g) » ery" 
os ee h) Sa 
(a) & Ev" mat (lt) 
n=o 3" oon Inwen” 
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2. Determine the interval convergence of each of the power series 


below: 

Oe an oo (—1)"(a + 2)" 
(a) atl (f) xu ae 
() foo 2? (8) Xu n! 
© Dam 0) 3 nae 

0 (—1)"(a — 2)?" . © (-1)?nIn nz” 
(d) & a (i) x mi 

(—1)"(a + 2)” Rie Oa ery 


@ 
Me 


3 
II 
un 
= 
NO 
3 
i 
oO 
i) 
3 


5.4 Polynomial Approximations; Maclaurin and Tay- 
lor Expansions 


Way back in our study of the linearization of a function we saw that 
it was occassionally convenient and useful to approximate a function by 
one of its tangent lines. More precisely, if f is a differentiable function, 
and if a is a value in its domain, then we have the approximation 

f(x) = fla) 
fi@ax ie 


TG 


, which results in 


f(x) = f(a)+f'(a)(a— a) for x near a. 


A graph of this situation should help remind the student of how good 
(or bad) such an approximation might be: 
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(a,f(a)) 


4 33 


_3+— y=f(a)+f'(a)(x-a) 


-4° 
Notice that as long as x does not move too far away from the point a, 
then the above approximation is pretty good. 


Let’s take a second look at the above. Note that in approximating 
a function f by a linear function L near the point a, then 


(i) The graph of L will pass through the point (a, f(a)), ie., D(a) = 
f(a), and 


(ii) The slope of the line y = L(x) will be the same as the derivative 
OL fF ab oa; 164-016). =f (a). 


That is the say, the “best” linear function L to use in approximating f 
near a is one whose 0-th and first derivatives at x = a are the same as 


for f: 
L(a) = f(a) and L'a) = f'(a). 


So what if instead of using a straight line to approximate f we were 
to use a quadratic function Q? What, then, would be the natural 
requirements? Well, in analogy with what was said above we would 
require f and Q to have the same first three derivatives (0-th, first, and 
second) at x = a: 


Q(a) = fla), Qa) = fia), and Q"(a) = f"(a). 
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Such a quadratic function is actually very easy to build: the result 
would be that 


fa) (Gay. 


Q(x) = fa) + fea) +5 


(The reader should pause here to verify that the above quadratic func- 
tion really does have the same first three derivatives as f at x = a.) 


This “second-order” approximation is depicted here. Notice the im- 
provement over the linear approximation. 


re 
sh 
5 flay) 


17 


y=f(a)+f'(a)(x-a)+f"(a)(x-a)?/2 


In general, we may approximate a function with a polynomial P,,(z) 
of degree n by insisting that this polynomial have all of its first n + 1 
derivatives at x = a equal those of f: 


R@O=f@, Foal @, 2Oat @). 4 PP @ a7" @, 


where, in general, f()(x) denotes the k-th derivative of f at x. It is 
easy to see that the following gives a recipe for P,,(x): 


f"(a) 


2! 


P,(«) = f(a)+f"(a)(a—-a)4 | f(a) 


(s—a)+ 3 (x a)?+---+f(a)(a—a)" 
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f@) = >» 


n fi (q 
3 (a) 


ro 7 
polynomial of degree n for f at x =a. If a = 0, the above polyno- 


mial becomes 


(2 — a)* is called the Taylor 


The polynomial P,(2) 


and is usually called the Maclaurin polynomial of degree n for f. 


What if, instead of stopping at a degree n polynomial, we continued 
the process indefinitely to obtain a power series? This is possible and 
we obtain 


(x —a)" Taylor series for f at x =a, and 


oo f(k) 
S> t ta) Maclaurin series for f. 
k=0 Nn: 


Warning. It is very tempting to assume that the Taylor series for a 
function f will actually converge to f(x) on its interval of convergence, 
that is, 
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For most of the functions we’ve considered here, this is true, but the 
general result can fail.'® As a result, we shall adopt the notation 


co f(a) k 
f@) ~ Lo e-o) 

k=0. 6 Ts: 
= f(a) 
to mean that f(x) is represented by the power series )~ 

k=0 

a)* - in Subsection 3.2 we’ll worry about whether “~” can be replaced 
with “=”. First, however, we shall delve into some computations. 


5.4.1 Computations and tricks 


In this subsection we'll give some computations of some Taylor and 
Maclaurin series, and provide some interesting shortcuts along the way. 


EXAMPLE lL. Let f(x) = sin and find its Maclaurin series expansion. 
This is simple as the derivatives (at 7 = 0) are easy to compute 


f(x) =sin0 =0, f'(0) =cos0 = 1, f”(0) = —sin0 =0, 


f”" (0) = —cos0 = —1, f(0) = sind = 0, 


and the pattern repeats all over again. This immediately gives the 
Maclaurin series for sin x: 


sinz~ez Pasta (eal). 
e e n=0 


19 As an example, consider the function f defined by setting 
: eve ify #0 
f(x) = 
0 ifa=0. 


One can show that all derivatives of f vanish at x = 0 and so cannot equal its Maclaurin series in 
any interval about x = 0. 
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EXAMPLE 2. (A handy trick) If we wish to compute the Maclaurin 
series for cos x, we could certainly follow the same procedure as for the 
sinx in the above example. However, since cosx = # sin x, we can 
likewise obtain the Maclaurin series for the cos x by differentiating that 


for the sin x, this yields the series 


qd & gent oO yen 
SE pe) One a Onn 


EXAMPLE 3. We we wanted to compute the Maclaurin series for sin 2”, 


then computing the higher-order derivatives of sin? would be ez- 
tremely tedious! A more sensible alternative would be to simply replace 
x by x? in the Maclaurin series for sin 2: 


tae 10 ql4 


: >) >) n 
law) a eon 1 
sin x D 31 5 I 71 Pm ) 


g Ant 


(2n+ 1)! ° 


d 
EXAMPLE 4. (A handy trick) Since In(1+2) = I we may start 
ry 


with the geometric series 


oo. 1 
l-gfa?----= a ea 5 
pa ) Leg 


and then integrate each term to obtain the Maclaurin series for In(1 + 


fe 


2 3 oo oO n+l 
x x x 
n(il+a2)~x oe = ot ge = ; 


(Note that there is no constant occuring in the above integrations since 
when « = 0, n(l+z)=In1=0.) 


mr 


d 
EXAMPLE 5. Since aa =e" = 1 at x = 0, we immediately have the 
er 


Maclaurin series expansion for e”: 


x x 
x 
em Ie fot = 7 3 


294 CHAPTER 5 SERIES AND DIFFERENTIAL EQUATIONS 


EXAMPLE 6. (A handy trick) In the above series we may substitute 
—x? for x and get the Maclaurin series for e~”: 


Of 28 b= DD) 


Note how easy this is compared to having to calculate #(e-*) (where 
the chain and product rules very much complicate things!). 


2n 
nv 


my” 


EXAMPLE 7. (A handy trick) Let’s find the Maclaurin series expan- 


sin x 
sion of the function f(x) = . In this case, we certainly wouldn’t 
a 


: are ; sin © 
want to compute successive derivatives of the quotient ———-. However, 
remembering the Maclaurin series expansion of sin x and then dividing 
by «x will accomplish the same thing much more easily; the resulting 
series is 


EXAMPLE 8. ‘The Taylor series expansion of cosx about x = 


We have, where f(z) = cosa, that f (3) =C68 (5) — a oa (3) 


sin (3) = —l, f" (@) = — COs (5) Sige (3) = sin (5) = 1, after 
which point the cycle repeats. Therefore, the Taylor series is 


PE ase) (ce 9 i = C9 
cosx ~ (x =) 31 5 = Lf ) (on =)! 
EXAMPLE 9. (A handy trick) We know that 

1 
Pag a ba eS SS = = -walid for’ |ali< 1. 
n=0 1-«£ 


we can get further valid sums by differentiating the above: 


CO 


14 27+3a°+4e°+--- = Yi (n+1)2" 
n=0 


1 tf 
= < (; —} = Gn valid for |x| < 1. 
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Further valid sums can be obtained by differentiating. 


EXERCISES 


1. Find the Maclaurin series expansion for each of the functions be- 


low: 
(a) a (f) sin? x (Hint: Use a double- 
1—«? angle identity.) 
2x 
(b) 1 
Oy 
me i (6) (1 — 2) 
el eo (h) In(1 + 2?) 
(d) G=27 (i) tan”! 4x 
(e) x sina (j) ze” 


2. Find the Maclaurin series expansion for the rational function 
f(a) oa al 
x) = ————_-. 
a ae 
priate trick.) 


(Don’t try to do this directly; use an appro- 


3. Sum the following series: 


(a) Se +1)" we ye 
(b) Ye m(e +)" oF (cane 
0 > Gar () Stet 

4. Sum the following numerical series: 
@) § oer @ for 


oa en (Se 
(—1)"(n2)" 
n=0 (n a 1)! 
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5. Consider the function defined by setting f(x) = In(1 + sin). 
(a) Determine the Maclaurin series expansion for f(x) through 
the x* term. 


(b) Using (a) determine the Maclaurin series expansion for the 
function g(x) = In(1 — sin z). 


(c) Using (a) and (b), determine the Maclaurin series expansion 
for Insec a. 


6. Consider the following integral 


d 
r= f(f) a 


(a) Using integration by parts, show that the internal integral is 
—In(1 — z) 
; 


equal to 
(b) Determine the Maclaurin series expansion for this. 
(c) Use a term-by-term integration to show that 
es 
Ee 


(which, as mentioned on page 276 is = a See the footnote.?°) 


oe) rT? 


7. Here’s a self-contained proof that Oa —. (See the footnote.) 


Step 1. To show that for any positive integer m, 


kn m(2m—1) 


ti = 
ae ae | 3 


To complete step 1, carry out the following arguments. 


20Using the variable change u = 3(y +2), v = $(y— 2), one can show in the above “double 


integral” is equal to =, giving an alternative proof to that alluded in HH, Exercise 15, page 228. 
Showing that the doable integral has the correct value requires some work! 


2lThis is distilled from I. Papadimitriou, A Simple Proof of the Formula 5 ze = ue Amer. 
Math. Monthly 80, 424-425, 1973. 
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(i) By equating the imaginary parts of DeMoivre’s formula 
cos né +isinné = (cos? +isin 0)” = sin” @(cot 6 +7)", 
obtain the identity 


sin né = sin" 6 | (") cot” 1 @ — (") cot” 3 9 + (") Cot” OSes 


(ii) Let n = 2m +1 and express the above as 
sin(2m + 1)@ = sin?"*! 6P,,(cot?@), 0<0< ss 
where P,,(x) is the polynomial of degree m given by 


2 1 2 1 2 1 
P, (2) = ( a Jan-( a Jamt4( ss Jatt | 


(iii) Conclude that the real numbers 


oy = cot? (5 “), 1 Som: 
m 


are zeros of P,,(a), and that they are all distinct. 


Therefore, 71, ®2, ..., Ym comprise all of the zeros of P,,(x). 
(iv) Conclude from part (iii) that 


ue kr am 2m+1 Ditech m(2m — 1) 
2 — = Co 
oe Cae = 2 th ( 3 /( 1 ) 4 


proving the claim of Step 1. 


Step 2. Starting with the familiar inequality sinz < x < tan 
for 0 < x < 7/2, show that 


1 T 
cots < = <1+cot*z, oe Os 
c 2 


Step 3. Put x = 1 where k and m are positive integers 
m 
and 1 < k <™m, and infer that 


ih k 2 1? m1 at 
cot? ( Z )< et pg cmd cot? ( 


k=1 


a): 
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Step 4. Use step 1 to write the above as 


2m—1) (2 2m . 
m(2m ) _ Qm+1) 1 m + mem 1) 
3 1 k=1 k2 3 
72 
Step 5. Multiply the inequality of step 4 through by ) and let 


m — co. What do you get? 


5.4.2. Error analysis and Taylor’s theorem 


In this final subsection we wish to address two important questions: 


Question A: If P,(x) is the Maclaurin (or Taylor) polynomial of 
degree n for the function f(x), how good is the approximation 
f(x) = P,(x)? More precisely, how large can the error 


|f(x) — Pr(x)| be? 


Question B: When can we say that the Maclaurin or Taylor series of 
f(a) actually converges to f(x)? 


The answers to these questions are highly related. 


The answer to both of these questions is actually contained in Tay- 
lor’s Theorem with Remainder. Before stating this theorem, I 
want to indicate that this theorem is really just a generalization of the 
Mean Value Theorem, which I'll state below as a reminder. 


Mean Value Theorem. Let f be a differentiable function on some 
open interval I. Ifa and x are both in I, then there exists a real number 
c between a and x such that 


f(x) = fl@) 


La 


= fie). 


Put differently, there exists a number c between a and x such that 


f(z) = fla) + f'((z— 4). 
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Having been reminded of the Mean Value Theorem, perhaps now 
Taylor’s Theorem with Remainder won’t seem so strange. Here it 
is. 


Taylor’s Theorem with Remainder. Let f be an infinitely-differentiable 
function on an open interval I. If a and x are in I, and if n is a non- 
negative integer, then there exists a real number c between a and x 
such that 


f(z) = f(a) + f'(a)(z a) | _ (a a)? asa 
aq (n+1)(¢ 
-4 u _ Te a)” 4 tla Gent, 


this is the remainder 


Proor.” We start by proving that, for all n > 0 


£@) fC 


2! n! a n! 


f(x) = f(a)+f'(a)(x-a) 
Note that since 

[fat = f(z) - fa), 
then a simple rearrangement gives 


f(x) = fat [ f@adt, 


which is the above statement when n = 0. We take now as our induction 
hypothesis, that 


N 
F(a) = Fla) f (a)(a=ay+ (aa) eae [EO ety 
is true. 
We evaluate the integral using integration by parts with the substi- 


tution 


22Very few textbooks at this level provide a proof; however, since the two main ingredients are 
induction and integration by parts, I felt that giving the proof would be instructive reading for the 
serious student. 
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(2 = t)e-9 
(n — 1)! 
du = f("*Y(t) dt v=— Se) 


u= f(t) dv = dt. 


n! 
From the above, one obtains 
« f(t) Gn) CT Cena Hal al) n 
ere dt ef MO [ — (a — 4)" dt 
— <%@ ie Fak) ' 
= —A(a ay +f (a tat 


Plugging this into the induction hypothesis shows that the original 
statement is correct for all n. 

Next, note that if F' = F(t) is a continuous function of t, then one 
has that 


b 
| Flt)dt = F(o)(b- a) 
for some number c between a and b. (Indeed F'(c) is the average value 


of F on the interval [a, b].) Using the substitution u = (« —t)("t)), and 
applying the above observation, we have 


(n+1) [ FW(e-1"at = i F(x — "*/u) du 
= Fle— "Yale a)"*", 


where a is between 0 and (a — a)("*)), If we set c = x — "Va we see 
that c is between a and x and that 


(n+1) f° F()(e-t)" dt = F(o)(@— a)". 


eG) 
Now set F(t) = — and apply the above to infer that 
nel ( fr) 
i: f' iy? dt = IG Ne ch aie 
hi +1)! 


This completes the bai 
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The above remainder (i.e., error term) is called the Lagrange form 
of the error. 


We’ll conclude this subsection with some examples. 


EXAMPLE 1. As a warm-up, let’s prove that the Maclaurin series for 
cos z actually converges to f(x) = cos for all x. We have, by Taylor’s 
Theorem with Remainder, that 


< 4 2n (2n+1) 
Coes ee Zz (C) ant1 
2) Al (Qn!) (2n +1)! 


for some real number c between 0 and x. Since all derivatives of cos x 


are tsinx or £cosz, we see that | f?"*)(c)| <1. This means that for 


fixed x, if we let n > oo, then the remainder 


perme) Qn+1 


= 
(2n + 1)! 
proving that 
2 4 es 2n 
Ge. oe i 
SS Pe ee oe _1)" 
eee a LC)" aa 


EXAMPLE 2. Here’s a similar example. By Taylor’s Theorem with 
Remainder, we have for f(x) = In(1+ 2), that 


2 3 n (n+1) 
7 hare x Ga ey ae 
In(1 =£-—~+—--:-+—4 eo, 

Pigs n' (n+)) 
for some real number c between 0 and «x. It is easy to verify that the 
Maclaurin series for In(1+) has interval of convergence —1 < x < 1, so 


| 
we need to insist that 2 is in this interval. Since f(t) (c) = at 
we see that the error term satisfies 
fOr VE att nl ert ntl 
Swat” =| aeot@anl =| Gromi@eD 
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In this case, as long as ee ae eS 1, then we are guaranteed that 


1 2 
| er gntl 


> 0. 


<1. Therefore, as n — oo we have 


(1+ c)rt! (1+0)"*?(n+1) 
Therefore, we at least know that for if —5 ts rae 
2 3 oe) n 
ae x 
In(1 S92 eat ta See, 
nil+2)=2 5 BG 3 2 ) =A 
In particular, this proves the fact anticipated on page 278, viz., that 
1 1 2 1 
Ha 2 SS soe Nei Ly 
nQ)=1—- 5+ g— = LC 


EXAMPLE 3. One easily computes that the first two terms of the 
a 

Maclaurin series expansion of 1+ is 1+ 3 Let’s give an upper 

bound on the error 


vire-(1+5)| 
when |x| < 0.01. By Taylor’s Theorem with Remainder, we know 
moe , where c is 
between 0 and x, and where f(x) = /1+4+ 2. Since f"(c) = 7 = 


(1+ os?” 
and since c is between 0 and x, we see that 1+c > .99 and so 


that the absolute value of the error is given by 


204. 


1 1 
Ti — < < 
i () A(1+ 032 > 4x 9932 = 


This means that the error in the above approximation is no more than 


(0.01)? 


2 
Or < .254 x ST < .000013. 


Another way of viewing this result is that, accurate to four decimal 
a 
places, J/l1+x2=1+ 5 whenever |x| < 0.01. 


EXERCISES 
CO gl 
1. Show that for all 7, e* = ms, 
n=0 n| 


SECTION 5.4 POLYNOMIAL APPROXIMATIONS 303 


2. Assume that you have a function f satisfying f(0) = 5 and for 
n>1f™(0) = (not 


(a) Write out P3(x), the third-degree Maclaurin polynomial ap- 
proximation of f(z). 


(b) Write out the Maclaurin series for f(x), including the general 
term. 


(c) Use P3(x) to approximate f (5). 


(d) Assuming that f(c) < } for all ¢ satisfying 0 < ¢ < 3, show 
that 
fa) PG) oe. 


3. The function f has derivatives of all orders for all real numbers 2. 
Assume f(2) = —3, f’(2) =5, f"(2) = 3, and f’"(2) = —8. 


(a) Write the third-degree Taylor polynomial for f about x = 2 
and use it to approximate f(1.5). 


(b) The fourth derivative of f satisfies the inequality |f“(x)| < 3 
for all x in the closed interval [1.5,2]. Use the Lagrange error 
bound on the approximation to f(1.5) found in part (a) to 
explain why f(1.5) 4 —5. 


(c) Write the fourth-degree Taylor polynomial, P(x), for g(x) = 
f(x? +2) about x = 0. Use P to explain why g must have a 
relative minimum at x = 0. 


4. Let f be a function having derivatives of all orders for all real 
numbers. The third-degree Taylor polynomial for f about 7 = 2 
is given by 


(a) Find f(2) and f”(2). 


(b) Is there enough information given to determine whether f has 
a critical point at « = 2? If not, explain why not. If so, deter- 
mine whether f(2) is a relative maximum, a relative minimum, 
or neither, and justify your answer. 
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(c) The fourth derivative of f satisfies the inequality | f MO(x)| <6 


(b 


SY 


SS” 


for all x in the closed interval [0,2]. Use the Lagrange er- 
ror bound on the approximation to f(0) found in part (c) to 
explain why f(0) is negative. 


Using mathematical induction, together with l’Hopital’s rule, 
P,(«) 


prove that lim = 0 where P,(x) is a polynomial of 


degree n. Conclude that for any polynomial of degree n, 


. P(e 
lm —“S* =0. 
L— E00 ev 


P(t 
Show that lim PC) = 0, where P is a polynomial. (Let y = +, 
x0 Cae x 


and note that as x > 0, y > +00.) 


(c) Let f(x) = e-=,a #0 and show by induction that 


fG)=0; (4) CF, where Q,, is some polynomial (though 


xv 
not necessarily of degree n). 


(d) Conclude from parts (b) and (c) that 


for all n > 0. 


(ce) What does all of this say about the Maclaurin series for e~!/""? 


5.5 Differential Equations 


In this section we shall primarily consider first-order ordinary”® dif- 
ferential equations (ODE), that is, differential equations of the form 
y = F(a,y). If the function F is linear in y, then the ODE is called 
a linear ordinary differential equation. A linear differential equation 
is therefore expressible in the form y’ = p(x)y + q(x), where p and q 
are functions defined on some common domain. In solving an ODE, 
we expect that an arbitrary constant will come as the result of an inte- 
gration and would be determined by specifying an initial value of the 


230 be distinguished from “partial” differential equations. 
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solution y = y(x). This results in the initial value problem of the 
form 


/ 


y =p(x)y+q(x), y(a) = yo. 


A good example of a commonly-encountered nonlinear ODE is the 
so-called logistic differential equation, 


y =ky—y), yO) = yo. 


5.5.1 Slope fields 


A good way of getting a preliminary feel for differential equations is 
through their slope fields. That is, given the differential equation 
(not necessarily linear) in the form y’ = F (x,y) we notice first that 
the slope at the point (x,y) of the solution curve y = y(z) is given by 
F (x,y). Thus, we can represent this by drawing at (x,y) a short line 
segment of slope F(x, y). If we do this at enough points, then a visual 
image appears, called the slope field of the ODE. Some examples 
will clarify this; we shall be using the graphics software Autograph to 
generate slope fields. 


EXAMPLE 1. Consider the ODE x = y— 2. The slope field is indicated 
below: 


~ 
~ 
~ 


f ci ys cA cA a a C¢ a hail = 7 — Bind ad 
! / / / / / / 7 7 7 o - - _~ ~ \ N 
! / I / / / / ¢ 7 a - - = ~ Ss N N 
' ! i] / ft ry ey eee ae a a a SO 
' / / / y =y-X / va 7 oa - - - ~ x \ N N \ 
' 1 / 7 7 4 7 - - - ~ s N N N N N 
' / / / / / / ¢ 7 o - - ~ s S N N N \ \ 
' / / / / 7 7 7 a - - - ~ \ N N N \ \ \ 
' / / / / 77 ¢ 7 - - - ~ BS N \ N \ \ \ \ 
' / / 7 ¢ 7 7 - - -_ ~ sn \ XN N \ \ \ \ \ 
t + a + 4 7 ” = * xs a a “ \ \ as + \ 
' 7 7 7 7 o - - ~ s s SN N N \ \ \ \ \ \ 
id / ¢ 7 a - + = > s N N \ \ \ \ \ \ \ \ 
, 7 7 oa - - - ~ . \ \ N \ \ \ \ \ \ \ \ 
, 7 7 - - - ~ . Ss N N \ \ \ \ \ \ \ \ \ 
, 7 - - - ~ s N ‘N N N N \ \ \ \ \ \ \ \ 
, - - = ~ . s N \ N \ \ \ \ \ \ \ \ \ \ 
, - =- ~ ~ x N N\ \ N\ \ \ \ \ \ \ \ \ \ \ 

- _ ~ s \ ‘\ N \ \ \ \ \ \ \ \ \ \ \ \ 

- ~ x \ N\ N N \ \ \ \ \ \ \ \ \ \ \ \ 
bn _ ~~ ~ ~ a a a a A A A A A A A A A A 1 
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From the above slope field it appears that there might be a linear 
solution of the form y = mx -+ b. We can check this by substituting 


into the differential equation: 


(mz+b) = mr+b—-az, 


niece 
dx 


=, 


=m 


which immediately implies that m = 1 and b 


3y(1 — y) has 


EXAMPLE 2. The logistic differential equation 1 
slope field given to the right. What we should be able to see from this 


picture is that if we have an initial condition of the form y(0) 


then the solution y = y(x) will satisfy lim y = 1. 


Yo > 0, 
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(x) passing through the initial point (29, yo). 
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1. For each of the slope fields given below, sketch a solution curve 


ae 
|| I CECE UE On 0 0 0) 
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(27% + 1) is a solution of the linear ODE 


1 
4 


y’ = 2y+- 2 for any value of the constant K. 


2. Show that y = Ke?” — 


3. Find a first-order linear ODE having y = 2? + 1 as a solution. 
(There are many answers.) 


4. In each case below, verify that the linear ODE has the given func- 


tion as a solution. 


(a) cy +y = 307, y=2". 


(b) y+ 2ay=0, y=e™. 


at, > Ue 


5. Consider the n-th order linear ODE with constant coefficients: 


(c) Qr7y" + 3ay’-—y=0, y 


(5.1) 


y + any? 4 


Assume that the associated characteristic polynomial 
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C(t) =a" + ane" + +--+ a2 + a9 


has a real zero a, i.e., that C(a) = 0. Show that a solution of the 
ODE (5.1) is y =e™. 


5.5.2 Separable and homogeneous first-order ODE 


Most students having had a first exposure to differential and integral 
calculus will have studied separable first-order differential equations. 
These are of the form 


HY = s(aalv) 


whose solution is derived by an integration: 
dy 
—— = | f(x) dz. 
| g(y) {4 


d 
EXAMPLE 1. Solve the differential equation 1, = —2yf. 
a 


SOLUTION. From 


[@ = -frcae, 


we obtain 


In |y| = —a?+C, 


where C' is an arbitrary constant. Taking the natural exponential of 
both sides results in jy) = e~™+t© = e°e-*. However, if we define 
K = ©, and if we allow K to take on negative values, then there is no 
longer any need to write |y|; we have, therefore the general solution 
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which describes a rapidly-' 
decreasing exponential func-' 
tion of x. The slope field, to-' 
gether with the particular so-—--+-_-______— 

lution with the initial condi- Z — 
a [If 1 oe 0 Dl lopli las cel: 
tion g(0) = 21s indicated -to,—____._- _—, ; 
the right. a oe ee ea a Se a: 


Geeleereeelee 


Some first-order ODE are not separable as they stand, but through 
a change of variables can be transformed into a separable ODE. Such 
is the case of ODE of the form 


ou = F(2), (5.2) 


for some function F’. A change of independent variable 


v= a 
e 


will accomplish this. To see this, we note that 


with respect to x and v the ODE (5.2) becomes 


ro +u = Fiv). 


The variables x and v separate easily, resulting in the ODE 


1 du 


F(v)—vdr a’ 


which can be solved in principle as above. 


EXAMPLE 2. The first-order ODE 
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d 
Qn? = = x+y? 
d Y 


can be reduced to the form (5.2) by dividing both sides by 22”: 


Se yh 


dx 2x? x 


Setting v = % as above reduces the above ODE to 


du 
pgp = 2 ea 
= ( ) 
that is, 
d 
Qn = y?—Qv4+1. 
dx 


After separating the variables, we arrive at the equation 


2 dv dx 
| (v—1)2 — i Ea 
Integrating and simplifying yields 
2 
In| x] +2C” 


Replace v by y/x, set c = 2C and arrive at the final general solution 


Vv = 


22 
= ¢—- ———_. 
In| a] +c 


We define a function F(z, y) to be homogeneous of degree & if 
for all real numbers t such that (tz, ty) is in the domain of F’ we have 
F(tx,ty) = t*F(x,y). Therefore, the function F(z,y) = 2? + y? is 
homogeneous of degree 2, whereas the function F(z,y) = /a/y is 


1 
homogeneous of degree —5. 


A first-order homogeneous ODE is of the form 


d 
M(x,y)5- + N(2,y) = 0, 
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where M(x, y) and N(z,y) are both homogeneous of the same degree. 
These are important since they can always be reduced to the form (5.2). 
Indeed, suppose that M and N are both homogeneous of degree k. 
Then we work as follows: 


dy N(z, y) 

dx M(a,y) 
a®N(1,y/zx) 
z*M(1,y/2) 
N(1, y/2) y 

a 


Xx 


which is of the form (5.2), as claimed. Note that Example 2 above is 
an example of a homogeneous first-order ODE. 


EXERCISES 
In the following problems, find both the general solution as well as 
the particular solution satisfying the initial condition. 


1. y' = 2ay’, y(0)=—-1 
2. S 2G, 40) 1 


3. 3y*y' = (1+ y”)cosz, y(0) =1 
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5.5.3 Linear first-order ODE; integrating factors 


In this subsection we shall consider the general first-order linear ODE: 


y +p(x)y = q(x). (5.3) 


As we'll see momentarily, these are, in principle, very easy to solve. 
The trick is to multiply both sides of (5.3) by the integrating factor 


Notice first that u(x) satisfies /(x) = p(x)u(x). Therefore if we multi- 
ply (5.3) through by p(2) we infer that 


£(u(2)u) = w(x)y' + p(x)u(x)y = p(x) g(a), 


from which we may conclude that 
wa)y = [ulx) qe)de. 
EXAMPLE 1. Find the general solution of the first-order ODE 
(c+1)y’-y=2, x>-l. 


First of all, in order to put this into the general form (5.3) we must 
divide everything by zx + 1: 


; 1 x 
se oe US 
x+1 


y ap St 


This implies that an integrating factor is 


_p de 1 
a : 
L(x) a 


Multiply through by p(a) and get 
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Yo. x dx 

eo a 
o. Phe LT ae. 
7 : (x +1)? 


= f(a cup) 


1 
ie ere 


It follows, therefore, that 


y = (x+1)In(x+1)4+c(x+1), 


where c is an arbitrary constant. 


EXERCISES 
1. Solve the following first-order ODE. 

(a) ry’ + 2y = 227, y(1) =0 
(b) 2x7y'+4ry=e*, y(2)=1. 
(c) xy’ + (x — 2)y = 32e*, y(1) =0 
(d) y/Inz + , ie aly 0 
(e) y' + (cotx)y = 3sinxcosz, y(0)=1 
(f) 2(2+1)y’—y=227(2+1), y(2)=0 


2. The first-order Bernoulli ODE are of the form 


y' + p(x)y = q(x)y", 


where n is any number other than 1. Show that the substitution 
u = y!-” brings the above Bernoulli equation into the first-order 
linear ODE 
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3. Solve the Bernoulli ODE 


5.5.4 Euler’s method 


In this final subsection we shall discuss a rather intuitive numerical 
approach to solving a first-order ODE of the form y’ = F(z,y), yo = 
y(xo). What we do here is to specify a step size, say h, and proceed to 
approximate y(x1), y(%2), y(v3),..., where 7] =a +h, r=2,+h= 
X9 + 2h, and so on. 


The idea is that we use the first-order approximation 


y(t) © y(xo) + y'(#o)(@1 — Lo) = y(%o) + y'(xo)h. 


Notice that y'(xo) = F(x, yo); we set y1 = y(2o) + F(Xo, yo)h, giving 
the approximation y(z1) © y1. We continue: 


y(to) & y(a1) + (x1) (x2 — 21) (first-order approximation) 
Sy ty' (ah (since y(a1) © y1) 
= yt P(ri,y)h (since F'(21,y(x1)) © F(x1,y1)). 


Continuing in this fashion, we see that the approximation y(%nj41) © 
Yn+1 at the new point x = x,41 is computed from the previous approx- 
imation y(%pn) © Yn at the point x, via 


y(Ln41) S Unt+1l = Yn le Das Ue ts 
EXAMPLE 1. Approximate the solution of the ODE y’ = x+y —- 


1, y(0) = 1 on the interval [0,2] using step size h = 0.2; note that 
Fi(z,y)=x2+y-1: 
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We can tabulate the results: 
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nm | In Yn = Yn-1 mM | Ln Yn = Yn-1 
+F(a¢n-1, 1) +F(tn-1, Yn—1) h 

0 | 0 1 6 | 1.2 | 1.788 

| .2 | i 7 | 1.4 | 2.1832 

2|.4 | 1.04 8 | 1.6 | 2.6998 

3 | .6 | 1.128 9 | 1.8 | 3.3598 

4] .8 | 1.2736 LO :|)'2.0 | 4.1807 

5 | 1.0 | 1.4883 


The figure below compares the exact solution with the approxima- 
tions generated above. 


r fF fF 4 4 x a, / / / / 7 7 / y 7 / / / 
fF 4 4 4Gi4 4 , , , , , 7 vy / 7 7 / / 
, Vn An ne . / 7 y 7 7 / 7 
, y =x+y-1, coe , Vrue Solution ro4fr 4 + t+ Ft 4 
ie . Pole ie le |e le le a ny A A a a 
i 2 vy Vr a a A 
coe we hte je er ee 7 a ee eee 2 t 4 
o- o a 7 7 7 7 7 7 7 g 7 7 7 7 / 4 
- - - o c 7 7 7 7 7 7 7 7 7 7 7 4 
= le te leslie ie. le le 2 he 7 7 7 7 7 4 
= = =- - - - - os a a 7 7 7 ¢ ¢ 4 
= =- =- -2- =- =- - - oc 7 4 7 ¢ 7 ¢ 4 
-_ = = =- =_ =- =- - = s os a ea 7 7 7 ¢ 7 4 
— —_ —_ -_ _ = - a ¢ 4 
~ > > She = - = -Euyler Approximations - - 
~ _~ _ _ _ al _- _ _ _ pp a > 4 
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Ss x x ~ ~ ~ ~ a -~ + _ = Lo = 2 ts de 2 ae 3 
s N N ~~ ~ ~ ~ ~ _~ _ — _ — - _ _- =_ =_- =- - 
ne enrc sd Gn NS 
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1. Give the exact solution of y’ = x+y-—1, y(0) = 1. Tabulate these 
exact values against their approximations in the table below: 
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Ln | Yn = Yn-1 + Cane Un) h y(Xn) 
Ge ¢ jell 
bo, || ecb 
A | 1.04 
6 
8 


1.128 

1.2736 
1.0 | 1.4883 
12 | 1Ler8s 

1.4 | 2.1832 
1.6 | 2.6998 
1.8 | 3.3598 
4.1917 


CON OTAKRWN FH OC]s 


a 
oC 
i 
oC 


2. Use the Euler method with h = 0.1 to find approximate values for 
the solution of the initial-value problem over the interval [1, 2] 


cy +y = 327, y(1) = —2. 
Then solve exactly and compare against the approximations. 


3. Do the same over the interval [0,1], (h = 0.1) for 


1 
y = 2xy+ i, (0) =A; 
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Inferential Statistics 


We shall assume that the student has had some previous exposure to 
elementary probability theory; here we'll just gather together some rec- 
ollections. 


The most important notion is that of a random variable; while 
we won’t give a formal definition here we can still convey enough of its 
root meaning to engage in useful discussions. Suppose that we are to 
perform an experiment whose outcome is a numerical value X. That 
X is a variable follows from the fact that repeated experiments are 
unlikely to produce the same value of X each time. For example, if we 
are to toss a coin and let 


1 if heads, 
xX — 
0 if tails, 


then we have a random variable. Notice that this variable X does not 
have a value until after the experiment has been performed! 


The above is a good example of a discrete random variable in 
that there are only two possible values of X: X = 0 and X = 1. By 
contrast, consider the experiment in which I throw at dart at a two- 
dimensional target and let X measure the distance of the dart to the 
center (bull’s eye). Here, X is still random (it depends on the throw), 
but can take on a whole continuum of values. Thus, in this case we call 
X a continuous random variable. 


317 


318 CHAPTER 6 INFERENTIAL STATISTICS 


6.1 Discrete Random Variables 


Let’s start with an example which is probably familiar to everyone. We 
take a pair of fair dice and throw them, letting X be the sum of the 
dots showing. Of course, X is random as it depends on the outcome 
of the experiment. Furthermore X is discrete: it can only take on the 
integer values between 2 and 12. Finally, using elementary means it is 
possible to compute the probability that XY assumes any one of these 
values. If we denote by P(X = x) the probability that X assumes the 
value x, 2 << x2 < 12 can be computed and tabulated as below: 


5 Ze cs | AL ll oe Oh | eee | AO EO) We 2 


_ 1 2 3 4 5 6 i) 4 3 2 1 
P(X =) 36 | 36 | 36 | 36 | 36 | 36 | 36 | 36 | 36 | 36 | 36 


The table above summarizes the distribution of the discrete ran- 
dom variable X. That is, it summarizes the individual probabilities 
P(X =<), where x takes on any one of the allowable values. Further- 
more, using the above distribution, we can compute probabilities of the 
form P(x, < X < x2); for example 


1 2 3 4 10 
P(2< X <5) = P(X =2)4+P(X = 3)4+ P(X =4)4+ P(X =5) = 36 36 36 36 ~ 36° 


It is reasonably clear that if X is an arbitrary discrete random vari- 
able whose possible outcomes are 71, Y2, %3,..., 


i=l 
This of fundamental importance! 


6.1.1 Mean, variance, and their properties 


We define the mean jx (or expectation E(X))! of the discrete ran- 
dom variable X by setting 


1Some authors use the notation (X) for the mean of the discrete random variable X. 
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px = E(XX) =) aiP(X =x), 


where the sum is over all possible values x; which the random variable _X 
can assume. As we'll see, the above is often an infinite series! This value 
can be interpreted as the average value of X over many observations of 
X. (We'll give a slightly more precise formulation of this in section ??.) 
For example, if X is the random variable associated with the above dice 
game, then 


i 2 3 5 5 6 
TCX \rcae 2 + 3x t4 x + 5 x + 6 Xx + 7 Xx 
36 36 36 36 36 36 
ai, BO Mone a aye AAS lp ae Cort 
36 36 36 36 “| ae 


Let X and Y be two discrete random variables; we wish to consider 
the mean E(X +Y) of the sum X + Y. While it’s probably intuitively 
plausible, if not downright obvious, that E(X + Y) = E(X)+ E(Y), 
this still deserves a proof.’ 

So we assume that X and Y are discrete random variables having 
means F(X) = px and E(Y) = py, respectively. Of fundamental 
importance to the ensuing analysis is that for any value x, then the 
probabilities P(X = x) can be expressed in terms of conditional prob- 
abilities? on Y: 


P(X =2) = L P(X =2|¥ =y))PV = 4). (6.1) 


Likewise, the probabilities P(Y = y) can be similarly expressed in 
terms of conditional probabilities on X: 


?Elementary textbooks typically only prove this under the simplifying assumption that X and Y 
are independent. 

3Here, we have assumed that the students have already had some exposure to conditional proba- 
bilities. Recall that for any two events A and B the probability of A conditioned on B is given 
by 


Pulse 5 8) 
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PY =y) = SPW =y|X =a) P(X =a). (6.2) 


Having noted this, we now proceed: 


Uxsy = 


proving that 


er XH a |Y Hy) PY = 4) 


+) DL YyyP(Y = yj |X = 21)P(X = xi) 
De RR tal SE a) 
i= j= 

j=l i= 


CO 


3 P(X =a) + wP(Y =) _ by (6.1) and (6.2) 


j=l 
LX + Ly, 


E(X+Y) = E(X)+E(Y). (6.3) 


Next, note that if X isa random variable and if a and b are constants, 
then it’s clear that E(aX) = aE(X); from this we immediately infer 
(since b can be regarded itself as a random variable with mean b) that 


E(aX +b) = aE(X) +6. 
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Next, we define the variance o? (or Var(X)) of the random variable 
X having mean p by setting 0? = E((X — )*). The standard de- 
viation o is the non-negative square root of the variance. The mean 
and variance of a random variable are examples of parameters of a 
random variable. 


We shall derive an alternate—and frequently useful—expression for 
the variance of the random variable X with mean pu. Namely, note that 


Var(X) = E((X — p)"*) 
= E(X? — 2X + yp’) 
(X*) — 22E(X) +p" (by (6.3)) 
Gea: (6.4) 


We turn now to the variance of the discrete random variable X +Y. 
In this case, however, we require that X and Y are independent. This 
means that for all values x and y we have 


PA Sand YS) aP Caney Sy). 


In order to derive a useful formula for Var(X + Y), we need the result 
that given X and Y are independent, then E(XY) = E(X)E(Y); see 
Exercise 1, below. Using (6.4), we have 


Var(X+Y) = E((X+Y)*)-pwxay 


( 
= Cee) ) — (ux + py)” 
= E(X?4+2XY +Y") — (ux + py)? 
= E(X?) 4 BQXY) + E(Y?) — (wy + Quxpy + Hy) 


= E(X*) — py + 2E(X)E(Y) — 2uxpy + EY") — wy 
= Var(X)-+ Var(Y). (6.5) 


4An equivalent—and somewhat more intuitive—expression can be given in terms of conditional 
probabilities. Namely, two events A and B are equivalent precisely when P(A|B) = P(A). In terms 
of discrete random variables X and Y, this translates into P(X = x2|Y = y) = P(X = 2) for any 
possible values x of X and y of Y. 
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As you might expect, the above formula is false in general (i.e., when 
X and Y not independent); see Exercise 1, below. Using (6.5), we see 
immediately that if X is a discrete random variable, and if Y = aX +), 
where a and 6 are real numbers, then we may regard b as a (constant) 
random variable, certainly independent of the random variable aX. 
Therefore, 


Var(Y) = Var(aX +b) = Var(aX) + Var(b) = a?Var(X), 


where we have used the easily-proved facts that Var(aX) = a?Var(X) 
and where the variance of a constant random variable is zero (see Ex- 
ercises 5 and 6, below). 


We conclude this section with a brief summary of properties of mean 
and variance for discrete random variables.° 


e If X is arandom variable, and if a, b are real numbers, then 
E(aX +b) = ak(X) +6. 


e If X is a random variable, and if a, b are real numbers, then 
Var(aX +b) = a*Var(X). 


e If X and Y are random variables, then 
E(X+Y)= E(X)+ E(Y). 


e If X and Y are independent random variables, then 
E(XY) = E(X)E(Y). 


e If X and Y are independent random variables, then 
Var(X + Y) = Var(X) + Var(Y). 


6.1.2 Weak law of large numbers (optional discussion) 


In order to get a better feel for the meaning of the variance, we include 
the following two lemmas: 


5These same properties are also true for continuous random variables! 
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LEMMA. (Markov’s Inequality) Let X be a non-negative discrete ran- 
dom variable. Then for any number d > 0, we have 


P(X > d) < 5 E(X). 


PROOF. We define a new random variable Y by setting 


—_ f if X>d 


0 otherwise. — 


Since Y < X, it follows that E(X) > E(Y). Also note that Y has two 
possible values: 0 and d; furthermore, 


EY) (= dPlyY jay =] dP ou): 
Since E(X) > E(Y) =dP(X > d), the result following immediately. 


LEMMA. (Chebyshev’s Inequality) Let X be a discrete random variable 
with mean pp and variance 0”. Then for any d > 0 we have 


ProoF. Define the random variable Y = (X — p)?; it follows that 
E(Y) = 0?. Applying Markov’s inequality to Y we have 


o 
P(X —pl >d)=P(Y 2d’) < SEY) = me 
as required. 
We now assume that X1, Xo, ...,X, are random variables with the 


same mean j1; we denote the average of these random variables thus: 


en 1+ Ae 


n 
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From what we’ve proved about the mean, we see already that E(X) = 
ut. Incase the random variables X;, Xo, ..., have the same distribution, 
the Weak Law of Large Numbers says a bit more: 


LEMMA. (The Weak Law of Large Numbers) Assume that X,, Xo, 

..,Xn,---, IS an infinite sequence of identically distributed random 
variables with mean 1 (and having finite variance 07). Then for each 
e>0 


PROOF. We set S, = X1+ Xo+---+Xy, and so A, = S,/n has mean 
wand variance o?/n. By Chebyshev’s Inequality we have 
2 


P( >e) <5. 
NE 


Since € > 0 is fixed, the result is now obvious. 


An — pb 


Notice that an equivalent formulation of the Weak Law of Large 
Numbers is the statement that for all « > 0 we have that 


De Ain Se anes ee 
kim, P ( Maree | pl <e) =1. 
noo n 
As you might expect, there is also a Strong Law of Large Numbers 
which is naively obtained by interchanging the limit and probability P; 
see the footnote.® 


EXERCISES 


1. Prove that if X and Y are discrete independent random variables, 
then E(XY) = E(X)E(Y). Is this result still true if X and Y are 
not independent? 


®That is to say, if X,, Xo, ...,Xp,..., is an infinite sequence of identically distributed random 
variables with mean p, then 


No ee Ee, 
p(t Beet aE +n) =1. 
n 


There is no requirement of finiteness of the variances. 
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Z. 


Suppose that we draw two cards in succession, and without re- 
placement, from a standard 52-card deck. Define the random vari- 
ables X, and X» by setting 


xy, = 1 if the first card drawn is red 
; 0 if the first card drawn is black; 


similarly, 


x f if the second card drawn is red 
9) oy 


Q if the second card drawn is black. 
(a) Are X, and X» independent random variables? 
(b) Compute P(X, = 1). 
(c) Compute P(X = 1). 


. Suppose that we have two dice and let D, be the result of rolling 


die 1 and Dy the result of rolling die two. Show that the random 
variables D, + Dz and D, are not independent. (This seems pretty 
obvious, right?) 


. We continue the assumptions of the above exercise and define the 


new random variable T' by setting 


_ fl ifD\+D,=7 
— 10 if D,+D, 47. 


Show that T and D,; are independent random variables. (This 
takes a bit of work.) 


. Let X be a discrete random variable and let a be a real number. 


Prove that. Var(aX) = a?Var(X). 


. Let X be a constant-valued random variable. Prove that Var(X) = 


0. (This is very intuitive, right’) 


. John and Eric are to play the following game with a fair coin. 


John begins by tossing the coin; if the result is heads, he wins and 
the game is over. If the result is tails, he hands the coin over to 
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Eric, who then tosses the coin. If the result is heads, Eric wins; 
otherwise he returns the coin to John. They keep playing until 
someone wins by tossing a head. 
(a) What is the probability that Eric wins on his first toss? 
(b) What is the probability that John wins the game? 
(c) What is the proability that Eric wins the game? 
8. Let n bea fixed positive integer. Show that for a randomly-selected 
positive integer x, the probability that x and n are relatively prime 
n 
is ay (Hint: see Exercise 20 on page 64.) 
n 
9. Consider the following game. Toss a fair coin, until the first head 
is reached. The payoff is simply 2” dollars, where n is the number 


of tosses needed until the first head is reached. Therefore, the 
payoffs are 


No. of tosses} 1 | 2 | 3 ]--. | n 
Payot | $2/ $4) $8|--- | $2" 


How much would you be willing to pay this game? $10? $20? Ask 
a friend; how much would she be willing to play this game? Note 
that the expected value of this game is infinite!’ 


6.1.3. The random harmonic series (optional discussion) 


We close this section with an interesting example from analysis. We 
CO 


1 
saw on page 265 the harmonic series 54> — diverges and on page 278 


n=1 
co (_] n-1 
we saw that the alternating harmonic series 5° (Sve converges (to 
n=1 nm 
In 2; see page 302). Suppose now that €1, €2, ... is a random sequence 


of +1s and —1ls, where we regard each e, as a random variable with 
P(ex = 1) = Plex = —1) = 1/2. Therefore each e, has mean 0 and 


OO Ey, 
variance 1. What is the probability that 5) — converges? 
n 


n=1 


’Thus, we have a paradox, often called the St. Petersburg paradox. 
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We can give an intuitive idea of how one can analyze this question, 


as follows. We start by setting X; = oe k=1, 2,..., and note that 
1 
E(X,) = 0 and Var(X;,) = pas Now set 


nm € 
= 3 a 
k=1 far B78 
It follows immediately that S,, has mean 0 and (finite) variance 


De yl 
> saa (See footnote?) 
kai hk 


Under these circumstances it follows that the above infinite sum 5° X;, 
k=1 
actually converges with probability 1.9 Furthermore, the same argu- 


ments can be applied to show that as long as p > 1/2, then the random 
— En : aes 
series 5° ny also converges with probability 1. 
n=1 t 
We turn now to some relatively commonly-encountered discrete ran- 
dom variables, the geometric, the binomial, the negative binomial, 
the hypergeometric, and the Poisson random variables. 


6.1.4 The geometric distribution 


Consider the following game (experiment). We start with a coin whose 
probability of heads is p; therefore the probability of tails is 1 — p. The 
game we play is to keep tossing the coin until a head is obtained. The 
random variable X is the number of trials until the game ends. The 
distribution for X as follows: 


a | 2 Bi. lle : 
P(X =2)| p |pQ—p) | p(l—p)’|+- |pd—p) 


Therefore, the expectation of X is given by the infinite sum: 


8 —4/3 = 
Note that 97 gia < Do gas <t+ fs 2 dy = 4, 
=1 


°This can be inferred on the Kolmogorov Three-Series Theorem, see, e.g., Theorem 22.8 
of P. Billingsley, Probability and Measure, 2nd ed. John Wiley & Sons, New York, 1986. 
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E(X) = Ys nP(X =n) = > np — py"! = p> n( py 


oo d 
1 =, n-1 = 1 2 be 
pa i) ae toe+ a 4 ) 


= =(—) 
~ dr \l—-«x 


1 
(beara) 
1 
p 


z=1—p 


«z=1-p 


«z=1—-p 


which implies that the mean of the geometric random variable X is 
given by 


E(X) = p¥e nlp) = 2. 


Notice that the smaller p becomes, the longer the game is expected to 
last. 


Next, we turn to the variance of X. By Equation 6.4 we have 


eno ome B(X*) — = 
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Pa} 
bo 
aa 
| 
S 
3 
i 
| 
eas 


n(n 1)(1— py" + Ym py 


= LP) YE nln (lp) s: nt =p) 


=a Gil P) Fa beter? 4.) 


a 1 1 
1 rake, 
( P) dee (; = :) xz=1—-p p 


_ Al—p), 1 _2-p 


xz=1—p ip 


Pp po ap 
Therefore, 
ah L—p 
Var(X) = 5 = ; 
Pp Pp Pp 


6.1.5 The binomial distribution 


In this situation we perform n independent trials, where each trial has 
two outcomes—call them success and failure. We shall let p be the 
probability of success on any trial, so that the probability of failure on 
any trial is 1—p. The random variable X is the total number of successes 
out of the n trials. This implies, of course, that the distribution of X 
is summarized by writing 


The mean and variance of X are very easily computed once we realize 
that X can be expressed as a sum of n independent Bernoulli random 
variables. The Bernoulli random variable B is what models the tossing 
of a coin: it has outcomes 0 and 1 with probabilities 1 — p and p, 
respectively. Very simple calculations shows that 


E(B)=p and Var(B) = p(1—p). 
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Next, if X is the binomial random variable with success probability p, 
then we may write 


X = B+ Bot->>+ Bp, 
where each B; is a Bernoulli random variable. It follows easily from 
what we already proved above that 
E(X) = E(B) + E(By)+---E(B,) = np, 


and 


Var(X) = Var(B,) + Var(B2) +---+ Var(Bn) = np(1—p). 


6.1.6 Generalizations of the geometric distribution 


Generalization 1: The negative binomial distribution 


Suppose that we are going to perform a number X of Bernoulli trials, 
each with success probability p, stopping after exactly r successes have 
occurred. Then it is clear that 


x—1 


P(X = 2) = (f= f)ra-or 


In order to compute the mean and variance of X note that X is easily 
seen to be the sum of geometric random variables G;, Go, ...,G,, where 
the success probability of each is p: 

X = G,tGot---+G,. 
Using the results of (6.1.4) we have, for each 7 = 1, 2, ...,r, that 


l—p 

pe 
It follows, therefore, that the mean and variance of the negative bino- 
mial random variable X are given by 


E(G;) = : Var(G,) = 
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eae Var(X) = ooh. 


The name “inverse binomial” would perhaps be more apt, as the 
following direct comparison with the binomial distribution reveals: 


Binomial Random Variable Negative Binomial Random Variable 
XxX Y 
number of successes number of trials 
in 7 trials needed for r successes 


E(X) =np, Var(X)=np(l—p)|  E(Y) = a Vary ae 


Generalization 2: The coupon problem 


Suppose that in every cereal box there is a “prize,” and that there are, 
in all, three possible prizes. Assume that in a randomly purchased 
box of cereal the probability of winning any one of the prizes is the 
same, namely 1/3. How many boxes of cereal would you expect to buy 
in order to have won all three prizes? It turns out that the natural 
analysis is to use a sum of geometric random variables. 

We start by defining three independent random variables X 1, Xo, 
and X3, as follows. Xj, is the number of trials to get the first new prize; 
note that Xj, is really not random, as the only possible value of X, 
is 1. Nonetheless, we may regard X, as a geometric random variable 
with probability p = 1. Xp» is the number of trials (boxes) needed to 
purchase in order to get the second new prize, after the first prize is 
already won. Clearly X92 is also a geometric random variable, this time 
with p = 2/3. Finally, X3 is the number of boxes needed to purchase to 
get the third (last) new prize after we already have two distinct prizes, 
and so X3 is geometric with p = 1/3. Therefore, if X is the number of 
boxes purchased before collecting the complete set of three prizes, then 
X = X,+ X)+ X3, which represents X as a sum of geometric random 
variables. 

From the above, computing E(X) is now routine: 
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E(X) — E(X,+X2+X3) — E(X1)+E(X2)+E(X3) = 14543= —% 


The generalization of the above problem to that of finding the ex- 
pected number of boxes needed to purchase before collecting all of n 
different prizes should now be routine! The answer in this case is 


n n n = 
E(X)=1 ve = 
(X) aq oa ge De 


a | Re 


Generalization 3: Fixed sequences of binary outcomes 


In section 6.1.4 we considered the experiment in which a coin is repeat- 
edly tossed with the random variable X measuring the number of times 
before the first occurrence of a head. In the present section we modify 
this to ask such questions such as: 


e what is the expected number of trials before obtaining two heads 
in a row?, or 


e what is the expected number of trials before seeing the sequence 
HT? 


What makes the above questions interesting is that on any two tosses 
of a fair coin, whereas the probability of obtaining the sequences HH 
and HT are the same, the expected waiting times before seeing these 
sequences differ. The methods employed here can, in principle, be 
applied to the study of any pre-determined sequence of “heads” and 
“tails.” 
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In order to appreciate the method em- 

ployed, let’s again consider the geo- (B =1 and the game 
metric distribution. That is, assume is is over) 

that the probability of flipping a head —_ (B = 0 and we start 
(H) is p, and that X measures the 4 _ the experiment over 
number of trials before observing the eee as sates 
first head. We may write X = B+(1— wertameds 
B)(1+Y), where B is the Bernoulli 

random variable with P(B = 1) = p 

and P(B = 0) = 1 — >», and where 

Y and X have the same distribution. 

(See the tree diagram to the right.) 


pL 


It follows, therefore that 


E(X) = £E(B+(1-B)(1+Y)) 
= E(B)+ E(1— B)E(1+Y) (since B and Y are independent) 
1—p)1+ E(X)) 


| 
RS 


Solving for E(X) quickly yields the correct result, viz., F(X) = 1/p. 


The above method quickly generalizes to sequences. Let’s consider 
tossing a coin with P(heads) = p, stopping after two consecutive heads 
are obtained. Letting X be the number of trials, Y have the same dis- 
tribution as X and letting B; and By be independent Bernoulli random 
variables, we may set 
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oe HT (By, = By =1 and the game is over) 


ae = 1, By = 0 and we start the experiment 


T over again with two trials already having been 
performed.) 


(B, = 0 and we start the experiment over again 
with one trial already having been performed.) 


Computing the expectation of both sides of (6.6) quickly yields 


E(X) = 2p? + p(1 — p)(2+ E(X)) + (1 — p)(1+ E(X)), 


from which it follows that 


Note that if the coin is fair, then the expected waiting time before 
seeing two heads in a row is 6. 


Similar analyses can be applied to computing the expected wait- 
ing time before seeing the sequence HT (and similar) sequences, see 
Exercises 8, 9, and 10 on page 341. 


6.1.7 The hypergeometric distribution 


This distribution is modeled by a box containing N marbles, of which 
n of these are of a particular type (“successful” marbles) and so there 
are N — n “unsuccessful” marbles. If we draw k marbles without re- 
placement, and if X is the random variable which measures the number 
of successful marbles drawn, then X has distribution given by 


ae) 
an 


From the above it follows that the mean of X is given by the sum 


PX St = W012 on max tik. 
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max {nk} 7(7 N-n 
E(X) = xy Maen! , 


We can calculate the above using simple differential calculus. We 
note first that 


Gal) Sa uN +1)”. 


(a +1)” Naa 


dx 


Now watch this: 


sy k\(N-n\\ , a n N=? (N—1n this takes 
= ™m , Dp 
2 (Ebola) * = Zomba) Cp") (setae 


equating the coefficients of x* yields 


E.m(n)(e—n) = x Ce) 


This immediately implies that the mean of the hypergeometric distri- 
bution is given by 


Next, we observe that 
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x” ate + | (e@+1)*" = nln—-1)e*(24+1)*? 
on(n—1) d? N 
~ “ N(N—Dd gt) 


Next comes the hard part (especially the first equality): 


2 (Zmem— ln) an)) = Zeal 


2 a n N-n 
= wate + (2 +1) 
= 2 n(n — 1) a? _4\N 
= 2 ae 
= n(n — 1) iy ‘eee 
= VW oe aE 


Just as we did at a similar juncture when computing F(X), we equate 
the coefficients of x”, which yields the equality 


Eminem) = “waren (ie) 


The left-hand sum separates into two sums; solving for the first sum 
gives 


EG) (eam) =“ xav=n (a) 2") a) 


which implies that 


n(n—1)k(k-—1) nk 

HX?) = ; 

oe MN ON 

Finally, from this we obtain the variance of the hypergeometric distri- 
bution: 
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Var(X) = E(X?) — E(x) 
7 n(n — 1)k(k — 1) | nk (my 
N(N —1) N 
nk(N —n)(N — k) 
N2(N — 1) 


6.1.8 The Poisson distribution 


The Poisson random variable can be thought of as the limit of a 
binomial random variable in the following sense. First of all, assume 
that Y is the binomial random variable which measures the number of 
successes in n trials and where the probability of each trial is p. As 
we saw above, the mean of this random variable is py = np. Now, 
rather than limiting the number of trials, we take the limit as n — oo 
but holding fixed the mean pw = py. We call the resulting random 
variable the Poisson random variable with mean wy. If we denote 
this by X, then the distribution of X is computed as follows: 


P(X =k) = lim, (j)oha =n 


sof WV BNE NESE 4 3 
~ () Ca) ete) 


Mg Bit CHG 


n-360 k! i 5 
_ an—1)---(n—k4+1) (pe y\n-k 
k n—k 
= @ im (1 a ) (since lim eee es a Shy) = 1) 
! nN? 0oO nN n—00o 
k —k 
- (2) aa 7 0-8 
k n 
= (Sr) aim (1-4) 
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Next, note that the limit above is a 1° indeterminate form; taking the 
natural log and applying |’Hopital’s rule, we have 


lim (1 — ny ee, (6.7) 
Noo n 
and so it follows that 
eh yk 
BX Hh) = uw 


This gives the distribution of the Poisson random variable! 


The Poisson distribution is often used to model events over time 
(or space). One typical application is to model traffic accidents (per 
year) at a particular intersection of two streets. For example, if our 
traffic data suggests that there are roughly 2.3 accidents/year at this 
intersection, then we can compute the probability that in a given year 
there will be less than 2 accidents or more than 4 accidents. These 
translate into the respective probabilities P(X < 1) and P(X > 5). 
Specifically, 


POX) =: PIS Oye POS 
e232 30 e239 31 
0! 1! 
Se (10 8) Sa8e 38; 


In the same vein, 


P(X >5) = 1—P(X <4) 
25% 780.20. ett OB ne 3? | ie 8D Be ee Bs 
7 OFS Sp pr a a 

2.3? 2.33 ) 


—2.3 | | | | 
l-e (14234 a ta 


0.081. 


2 
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We expect that the mean of the Poisson random variable is ju; how- 
ever, a direct proof is possible as soon as we remember the Maclaurin 
series expansion for e” (see Exercise 1 on page 302). We have that 


E(X) = 2 RP(X =k) 
Sk 
2 k! 
oe) ert are 
=e 
oo ae 
=e = be rer a, 
k=o Ki! 


as expected. 


Similarly, 


k=0 
ee) e Pu 
aa, k? ery 
= an 
Sa. ahEl 
= H 2 
es oe een 
Py k! 
uk 
= pe* > (k4+1)— — 2 
kao k! 
ee) k k 
H Ht 2 
= pe" > kb ye * >) 
Bee eR 
a, 
= Pet tye 
k=o ft 
2 2 


That is to say, Var(X) = p = E(X). 
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EXERCISES 


1. Suppose that you are going to toss a fair coin 200 times. Therefore, 


you know that the expected number of heads obtained is 100, and 
the variance is 50. If X is the actual number of heads, what does 
Chebyshev’s Inequality say about the probability that X deviates 
from the mean of 50 by more than 15? 


. Suppose that you are playing an arcade game in which the prob- 


ability of winning is p = .2. 


(a) If you play 100 times, how many games do you expect to win? 


(b) If you play 100 times, what is the probability that you will 
win more than 30 games? 


(c) If you play until you win exactly 20 games, how many games 
will you expect to play? 


(d) If you stop after winning 20 games, what is the probability 
that this happens no later than on the 90-th game? 


. Prove that the sum of two independent binomial random variables 


with the same success probability p is also binomial with success 
probability p. 


. Prove that the result of Exercise 3 is correct if “ binomial” is 


replaced with “negative binomial.” 


. Prove that the sum of two independent Poisson random variables 


is also Poisson. 


. Suppose that N men check their hats before dinner. However, 


the clerk then randomly permutes these hats before returning the 
hats to the N men. What is the expected number of men who 
will receive their own hats? (This is actually easier than it looks: 
let B; be the Bernoulli random variable which is 1 if the man 
receives his own hat and 0 otherwise. While B,, By, ..., By are 
not independent (why not’), the expectation of the sum is still the 
sum of the expectations.) 
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i 


10. 


11. 


My motorcycle has a really lousy starter; under normal conditions 
my motorcycle will start with probability 1/3 when I try to start it. 
Given that I need to recharge my battery after every 200 attempts 
at starting my motorcycle, compute the probability that I will have 
to recharge my battery after one month. (Assume that I need to 
start my motorcycle twice each day.) 


. On page 334 we saw that if we toss a fair coin in succession, the 


expected waiting time before seeing two heads in a row is 6. Now 
play the same game, stopping after the sequence HT occurs. Show 
that expected length of this game is 4. Does this seem intuitive? 


. Do the same as in the above problem, comparing the waiting times 


before seeing the sequences T HH versus THT. Are the waiting 
times the same? 


(A bit harder) Show that on tossing a coin whose probability of 

heads is p the expected waiting time before seeing & heads in a 
1—p* 

(1—p)p* 


As we have seen the binomial distribution is the result of witness- 
ing one of two results, often referred to as success and failure. The 
multinomial distribution is where there is a finite number of 
outcomes, O;, Oo, ...,Ox. For example we may consider the out- 
comes to be your final grade in this class: A, B, C, D, or F. Suppose 
that on any given trial the probability that outcome O; results is 
pi, 1 =1,2,...,k Naturally, we must have that pj+pot+---+p, = 1. 
Again, to continue my example, we might assume that my grades 
are assigned according to a more-or-less traditional distribution: 


row is 


A: 10% 
B: 20% 
C: 40% If we perform n trials, and we denote 
D: 20% 
F: 10% 


by X; the number of times we witness outcome O,, then the proba- 
bilities in question are of the form P(X] = 1, X29 = %2, ..., Xz, = 


342 


12, 
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Lp), Where 41 + %2+-+::-+ 2, =n. A little thought reveals that 
these probabilities are given by 


n Lick af 
P(X, = 01, Xo» = %9,.-. yee = a.) — ( ) poi ot 
V1,U2,...,Uk 


where is the multinomial coefficient 
1,02, » Uk 


n n! 
V1,U2,...,Uk X1!Xoq!---a;,! 


So, suppose that my grading distribution is as follows, and that I 
have 20 students. Compute the following probabilities: 


(a) P(3 As, 6 Bs, 8 Cs, 2 Ds, and 1 F) 

(b) P(83 or 4 As, 5 or 6 Bs, 8 Cs, 2 Ds, and 1 F) 
c) P(everyone passes (D or better) 

(d) P 


d) P(at most 5 people get As) 


(Gambler’s Ruin) Suppose that we have two players, player A and 
player B and that players A and B have between them N dollars. 
Players A and B now begin their game where player A tosses a fair 
coin, winning $1 from B whenever she tosses a head and losing and 
losing $1 (and giving it to B) whenever she tosses a tail. Holding 
N fixed, let p; = P(A bankrupts B|A started with 7 dollars). (It 
is clear that pp = 0 and that py = 1.) 


(a) Let E; be the event that A bankrupts B, given that A started 
with 7 dollars; then P(E;) = p;. Now argue that 


pi = P(E;|A wins the first game )P(A wins the first game ) 


+P(E;|B wins the first game )P(B wins the first game ) 
il 1 


= 5Pit1 + aPi-1: 


(b) From the above, obtain p; = ip;, i= 1,2,...,N. 
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13. 


14. 


15. 


16. 


(c) That is, if A starts with a dollars and 6 starts with 6 dollars, 


then the probability that A bankrupts B is 5 
a 


The point of the above is that if A plays against someone with a lot 
of capital—like a casino—then the probability that A eventually 
goes bankrupt is very close to zero, even if the game is fair! This 
is known as gambler’s ruin. 


Generalize the results of Exercise 12 to the case when the proba- 
bility of tossing head is p. That is, compute the probability that A 
bankrupts B, given that A starts with a dollars and B has N — a 
dollars. 


(An open-ended question) Note that the Poisson distribution with 
mean 2 and the geometric distribution with p = .5 both have the 
same mean and variance. How do these distributions compare to 
each other? Try drawing histograms of both. Note that the same 
can be said for the Poisson distribution with mean 2k and the 
negative binomial (p = .5, stopping at the k-th success). 


Suppose we have a large urn containing 350 white balls and 650 
blue balls. We select (without replacement) 20 balls from this 
urn. What is the probability that exactly 5 are white? Does this 
experiment differ significantly from an appropriately-chosen model 
based on the binomial distribution? What would the appropriate 
binomial approximation be? 


Suppose that we have a large urn containing 1000 balls, exactly 
50 of which are white (the rest are blue). Select 20 balls. Without 
knowing whether the selection was with or without replacement, 
estimate 


(a) the expected number of white balls in the sample; 


(b) the probability that you selected at most 2 white balls (using 
a Poisson model); 


(c) the probability that you selected at most 2 white balls (using 
a hypergeometric model); 
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(d) the probability that you selected at most 2 white balls (using 
a binomial model). 


17. Let X be the random variable associated with the coupon problem 
(see page 331), where n is the number of prizes involved. Compute 
the variance of X. 


18. Consider the following Minitab-generated histogram of 200 trials, 
where one stops after winning all of m = 5 prizes. 


200 trials, where each trial 
stops after winning all of 5 
prizes. 


mean = 10.96 


StDev = 4.82 


(a) Is the sample mean close to the theoretical mean obtained 
above? 


(b) How close does this histogram appear to that of a Poisson 
distribution (with the theoretical mean)? 


(c) The TI code below will simulate playing the above game N 
times where M is the total number of prizes. The result of 
playing the N games is stored in list variable D3. Have fun 
with it! 
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19; 


20. 


vA 


PROGRAM: PRIZES 
‘Input “NO OF PRIZES: ”, M_ :While B< 1 
‘Input “NO OF TRIALS: ”, N :C+1>4 C 


0A ‘randInt(1,M)—D 
For(I,1,N) :L9(D) +1 L»(D) 
:For(L,1,M) 1—-B 

0 > L»(L) :For(J,1,M) 

:-End :B*Lo(J)-B 
1+B “End 

:For(J,1,M) :End 

‘B*L,(J)> B O53 231) 

“End “End 

00> C ‘Stop 


Let X be the binomial random variable with success probability p 
and where X measures the number of successes in n trials. Define 
the new random variable Y by setting Y = 2X — n. Show that 
Y=y, —n<y<ncan be interpreted as the total earnings after 
nm games, where in each game we win $1 with each success and we 
lose $1 with each failure. Compute the mean and variance of Y. 


Continuing with the random variable Y given above, let T’ be 
the random variable which measures the number of trials needed 
to first observe Y = 1. In other words, T is the number of trials 
needed in order to first observe one’s cumulative earnings reach $1. 
Therefore PW =1).=p, PG" =) =0,. PG = 3)-=-07 (1 = p). 
Show that P(T = 2k +1) = C(k)p**1(1 — p)*, where C(k) = 
als): n=0,1,2,..., are the Catalan numbers. 
k+1\k 

We continue the thread of Exercise 20, above. Show that if p = 
1/2—so the game is fair—then the expected time to first earn $1 
is infinite! We'll outline two approaches here: a short (clever?) 
approach and a more direct approach. 


(a) Let E be the expected waiting time and use a tree diagram as 
on page 333 to show that E = $+ }(2E + 1), which implies 
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that EF must be infinite. 


(b) Here we'll give a nuts and bolts direct. approach.’° Note first 
that the expectation F is given by 


ek (2h GER) — 1 (2k = 
ieee Dok , where C(k) = 7 i she OI ee. 
FG s10 ce 4k 9 
(i) Show that C(k) = Se Uda ) k > 1 (This is a 


(k + 1)! 
simple induction).1! 


1 k 2 
(ii) Conclude that C(k) = ane, It (4 - ). 


m 
2 k 1 
(iii) Conclude that C'(k)2~@#-) = an gue (1 - =| 
(iv) Using the fact that Inv > «—1, show that In (1 — xc) > 
1 = 
at Mm = \ eran 
(v) Conclude that 


i (1 2 7) _ (a) 


a 21 


se St 
> e (tn) — (see Exercise 5 on page 269) 
(vi) Finish the proof that E = oo by showing that the series 


for E is asymptotically a multiple of the divergent har- 
monic series. 


22. Here are two more simple problems where Catalan numbers ap- 
pear. 


10T am indebted to my lifelong friend and colleague Robert Burckel for fleshing out most of the 
details. 


‘This still makes sense if k = 0 for then the numerator is the “empty product,” and hence is 1. 
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(a) Suppose that we wish to move along A. 

a square grid (a 6 x 6 grid is 

shown to the right) where we start 

from the extreme northwest vertex 

(A) and move toward the extreme 

southeast vertex (B) in such a way 

that we always move “toward” the 

objective, i.e., each move is either 

to the right (east) or down (south). © B 
A moment’s thought reveals that there are (¢) such paths. 
What is the probability that a random path from A to B 
will always be above or on the diagonal drawn from A to 
B? (Answer: For the grid to the right the probability is 
C(6)/(;) = 1/7.) This result generalizes in the obvious way 
to n x n grids. 


S 


Suppose this time that we have 2n people, each wishing to 
purchase a $10 theater ticket. Exactly n of these people has 
only a $10 bill, and the remaining n people has only a $20 
bill. The person selling tickets at the ticket window has no 
change. What is the probability that a random lineup of these 
2n people will allow the ticket seller to make change from the 
incoming receipts? (This means, for instance, that the first 
person buying a ticket cannot be one of the people having 
only a $20 bill.) 


23. Suppose that we have a room with n politicians and that they 
are going to use the following “democratic” method for selecting 
a leader. They will distribute n identical coins, each having the 
probability of heads being p. The n politicians each toss their 
respective coins in unison; if a politician’s coin comes up heads, 
and if all of the others come up tails, then this politician becomes 
the leader. Otherwise, they all toss their coins again, repeating 
until a leader has been chosen. 


(a) Show that the probability of a leader being chosen in a given 
round is np(1 — p)"~1. 


348 CHAPTER 6 INFERENTIAL STATISTICS 


(b) Show that the maximum probability for a leader to be chosen 
in a given round occurs when p = 1/n. 


(c) Show that ifn >> 0, and if p = 1/n, then the probability that 
a leader is chosen in a given round is & 1/e. (See Equation 6.7, 
page 338.) 


24. Suppose that in a certain location, the average annual rainfall is 
regarded as a continuous random variable, and that the rainfalls 
from year to year are independent of each other. Prove that in n 
years the expected number of reals rainfall years is given by the 


h i ies 1 4 : | 
armonic series vee , 
2S n 


6.2 Continuous Random Variables 


Your TI graphing calculator has a random-number generator. It’s called 
rand; find it! Invoking this produces a random number. Invoking this 
again produces another. And so on. What’s important here is that 
rand represents a continuous (or nearly so!) random variable. 


Let’s look a bit closer at the output of rand. Note first that the 
random numbers generated are real numbers between 0 and 1. Next, 
note that the random numbers are independent, meaning that the 
value of one occurence of rand has no influence on any other occurence 


of rand.!2 


A somewhat more subtle observation is that rand is a uniformly 
distributed random variable. What does this mean? Does it mean 
that, for example 


P(rand = 0214): =" P( rand 1/m) 


You will probably convince yourselves that this is not the meaning, as 
it is almost surely true that both sides of the above are 0, regardless of 


!2Of course, this isn’t the technical definition of “independence.” A slighly more formal definition 
of independence of random variables X and Y is that 


Plax X <bandc<Y<d)=P(a< X <b)P(c<Y <d). 
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the meaning of uniformity! What uniformity means is that for any two 
numbers 2; and x2 and any small number € satisfying 7; +€, v2 te € 
[0, 1] 


P(a, —€ < rand < 41+ €) = P(ap —€ < rand < 22+ €). 


A much simpler description of the above is through the so-called 
density function for the random variable rand. This has the graph 
given below: 


1 


The way to interpret this—and any other density curve y = f(x)— 
is that the probability of finding a value of the corresponding random 
variable X between the values a and 0 is simply the area under the 
density curve from a to b: 


Pas X <b) = [’ f(x)ar. 


For the uniform distribution this means simply that, for example, 
P(rand < 2/3) = 2/3, that P(rand > .25) = .75, P(.05 < rand < .6) = 
.09, and so on. 


Let’s consider another continuous random variable, defined in terms 
of its density function. Namely, let X be the random variable whose 
density function y = f,(t) is as given below: 
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y y= f(t) = 2t 


t 
1 


Two important observations are in order. 


(a) For any observation x of X,0 <a <1. 


(b) | fe) dx =1 


(See Exercise 1, below.) 


We see that the above density curve has quite a bit of “skew” to it; 
in particular it’s clear that a random measurement of X is much more 
likely to produce a value greater than .5 than less than .5. 


6.2.1 The normal distribution 


The normal density function has the general form 


1 o-3(=#)’ 


f(x) Jone 


where js and o are constants, or parameters!’ of the distribution. The 
graph is indicated below for w = 1 and a = 2: 


0.2 


Normal Density Curve 
mean = 1 
variance = 4 


6 4 2 2 4 6 


13We’ll have much more to say about parameters of a distribution. In fact, much of our statistical 
study will revolve around the parameters. 
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In this case the normal random variable X can assume any real value. 
Furthermore, it is true—but not entirely trivial to show by elementary 
means—that 


which by now we should recognize as being a basic property of any 
density function. 


EXAMPLE. Our graphing calculators allow for sampling from normal 
distributions, via the randNorm(y,0,n), where n is the number of in- 
dependent samples taken. The calculator operation 


randNorm(1, 2, 200) + Ly 


amounts to selecting 200 samples from a normally-distributed popula- 
tion having uw = 1 and o = 2. The same can be done in Autograph; the 
results of such a sample are indicated below: 


200 independent saniph y 
taken from an underlyin 
normal distributio 
with mean 1 and 
variance 4. 


sample mean = 1.165 
sample variance = 4,153 


4 #-3 2 -!1 1 2 3 4 5 6 7 


6.2.2 Densities and simulations 


In the above we had quite a bit to say about density functions and about 
sampling from the uniform and normal distributions. We’ll continue 
this theme here. 


Let’s begin with the following question. Suppose that X is the uni- 
form random number generator on our TI calculators: X = rand. Let’s 
define a new random variable by setting Y = VX = vVrand. What 
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does the underlying density curve look like? Is it still uniform as in the 
case of X? Probably not, but let’s take a closer look. Before getting 
too heavily into the mathematics let’s start by collecting 200 samples of 
Vrand and drawing a histogram. This will give us a general idea of what 
the underlying density curve might look like. Collecting the samples is 
easy: 


rand(200) > Ly 


puts 200 samples from this distribution into the TI list variable Ly. 
Likewise, this sampling is easily done using more advanced softwares 
as Autograph or Minitab. Below is an Autograph-produced histogram 


of these 200 samples. 
f 


2 


01 02 03 04 05 O06 07 08 OF 1 
We would suspect on the basis of this histogram that the underlying 


density curve is not uniform but has considerable skew. Intuitively, 
we could have seen this simply by noting that for any real number x 
satisfying 0 < x < 1 then x < \/z; this is what creates the histogram’s 
to skew to the left. 


Can we make this more precise? Yes, and it’s not too difficult. If we 
set X = rand, Y = Vrand, we have that for any value of t, 0<t< 1, 


PY =o) =P X <4) =P esr 


(since X is uniformly distributed on |0,1].) In other words, if fy is the 
density function for Y, then it follows that 


[ fr@)de = PY <=? 
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differentiating both sides with respect to t and applying the Fundamen- 
tal Theorem of Calculus, we get 


Of course, this is the density function given a few pages earlier. In 
summary, the square root of the uniform random-number generator 
has a linear density function given by f(t) = 2t. 


Assume, more generally, that we wish to transform data from the 
random-number generator X = rand so as to produce a new random 
variable Y having a given distribution function fy. If we denote this 
transformation by Y = g(X), we have 


t 
[ fr@)dz = PY < t) = P(g(X) <1) = P(X <9") =9"O), 
which determines g~! and hence the transformation g. 


EXERCISES 


1. If X is the random variable having the triangular density curve 
depicted on page 349, compute 


2. Suppose that you perform an experiment where you invoke the 
random-number generator twice and let Z be the sum of the two 
random numbers. 


(a) Compute P(.5 < Z < 1.65) theoretically. 
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(b) Estimate P(.5 < Z < 1.65) through a simulation, using the 
TI code as follows. (I would suggest taking N > 100 trials in 
this simulation. ) 


PROGRAM: SIMULI1 


04 C 
INPUT “N:”, N 
:For(I,1,N) 


‘rand + rand > Z 

C+(.5< Z)(Z < 1.65) 3 C 
-END 

:DISP “PROB: ”, C/N 
“STOP 


The quantity C/N is the estimated probability! 


(c) Construct a histogram for 100 observations of the random 
variable Z. Try the following code (using, say, N = 100): 


PROGRAM: SIMUL2 
‘INPUT “N:”, N 
{0} > Ly 
:For(I,1,N) 

‘rand + rand > L,(J) 
-END 


Once you’ve done the above, you then use your graphing cal- 
culator to graph the histogram of the list variable Ly. (You'll 
need to decide on sensible window settings.) 


3. Let B and C be uniform random variables on the interval [—1, 1]. 
(Therefore B and C are independent occurrences of 2 rand — 1.) 
Compute 


(a) the probability that the quadratic 2? + Br + C = 0 has two 
distinct real roots; 


(b) the probability that the quadratic x?7-+Bxr+C = 0 has a single 
multiple root; 
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(c) the probability that the quadratic x? + Br + C = 0 has two 
real roots. 


4. Do the same as above where you take 200 samples from a normal 
distribution having uw = 0 and o = 1. Create a histogram and 
draw the corresponding normal density curve simultaneously on 
your TI calculators.* 


5. Define the random variable by setting Z = rand’. 
(a) Determine the density function for Z. Before you start, why 
do you expect the density curve to be skewed to the right? 


(b) Collect 200 samples of Z and draw the corresponding his- 
togram. 


(c) How well does your histogram in (b) conform with the density 
function you derived in (a)? 


6. Consider the density function g defined by setting 


Ago. af = 4 
g(t) = aes 
SAP? if 0 


1/2 


(a) Show that Y = 1X| 1X5 ,, where X, = rand, X» = rand, 
X, and X9 are independent. (Hint: just draw a picture in the 
X,X>-plane to compute P(a < Y < B).) 


Ty drawing your histogram, you will need to make note of the widths of the histogram bars in 
order to get a good match between the histogram and the normal density curve. For example, if you 
use histogram bars each of width .5, then with 200 samples the total area under the histogram will 
be .5 x 200 = 100. Therefore, in superimposing the normal density curve you'll need to multiply by 
100 to get total area of 100. (Use Y; = 100 * normalpdf(X, 0, 1).) 


306 
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(b) Write a TI program to generate 200 samples of Y. 


(c) Graph the histogram generated in (b) simultaneously with the 
density curve for Y. 


7. Let Z = rand’ as in Exercise 5. Show that the density function for 


Z is given by 


eal, - Ai epee 
ja) = {se O<eS) 


0 otherwise. 


. We have seen that the density function for the normally-distributed 


random variable X having mean 0 and standard deviation 1 is 


The x? random variable with one degree of freedom is the 
random variable X? (whence the notation!). Using the ideas de- 
veloped above, show that the density function for X? is given by 


Ll 20p. sep 
o) = —ax le *". 
g(x) or 
(More generally, the x? distribution with n degrees of free- 
dom is the distribution of the sum of n independent y? random 
variables with one degree of freedom.)!° 


Below are the graphs of \? with one and with four degrees of 
freedom. 


15The density function for the x? distribution with n degrees of freedom turns out to be 


g?/2-le-2/2 


g(a) = “or/2E(B) 


where I'(4) = (¥ —1)! is n is even. If n = 2k +1 is odd, then 


r(8)=r(e+3)=(e-) (3) 


No] w 
oo ed 
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io 


10. 


ike 


7 distribution with four degrees of freedom 


Ga distribution with one degree of freedom 


(The Maxwell-Boltzmann density function) The Maxwell- 
Boltzmann distribution comes from a random variable of the form 


Y = (X24 X32 + X2, 


where X,, X2, X3 are independent normal random variables with 
mean 0 and variance a?. Given that the density of the y?-random 
variable with three degrees of freedom, show that the density of Y 


is given by eek 
) pe? 2a? 
n= : 
fr ( ) 2 ( a3 ) 


This distribution is that of the speeds of individual molecules in 
ideal gases.'© 


Using integration by parts, show that Ey?) = 1 and that Var(x?) = 
2 where y” has one degree of freedom. Conclude that the expected 
value of the x? random variable with n degrees of freedom is n 
and the variance is 2n. We’ll have much more to say about the y? 
distribution and its application in Section 6.6. 


Here’s a lovely exercise.!’ Circles of radius 1 are constructed in the 
plane so that one has center (2rand,0) and the other has center 
(2rand,1). Compute the probability that these randomly-drawn 


kT 
16T¢ turns out that the the standard distribution a is given by a = —, where T is the temperature 


m 
(in degrees Kelvin), m is the molecular mass, and k is the Boltzmann constant 


k = 1.3806603 x 10773? - kg/s? - K. 


17 This is essentially problem #21 on the 2008 AMC (American Mathematics Competitions) contest 


12 B. 
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circles intersect. (Hint: let X, and X2 be independent instances of 
rand and note that it suffices to compute P(4(X, — X2)? +1 < 4).) 


12. Let X be a random variable with density function f(x), non-zero 
on the interval a < x < b. Compute the density function of cX +d, 
where c and d are constants with a > 0, in terms of f. 


13. Define the random variable Z = rand, and so0 < Z <1. 


(a) Determine the density function of Y = 107. 


(b) Notice that the random variable Y satisfies 1 << Y < 10. Show 
that the probability that a random sample of Y has first digit 
1 (ie., satisfies 1 < Y < 2) is logy,)2 © 30.1%. (This result is 
a simplified version of the so-called Benford’s Law.) 


(c) Data arising from “natural” sources often satisfy the property 
that their logarithms are roughly uniformly distributed. One 
statement of Benford’s Law is that—contrary to intuition— 
roughly 30% of the data will have first digit 1. We formal- 
ize this as follows. Suppose that we a random variable 1 < 
Y < 10", where n is any positive integer, and assume that 
Z = logiyY is uniformly distributed. Show that the proba- 
bility that a random sample of Y has digit d, 1 < d < 0 is 


1 
I 1+}. 
08 ( hi ‘) 


6.2.3. The exponential distribution 


The exponential random variable is best thought of as a continuous 
analog of the geometric random variable. This rather glib statement 
requires a bit of explanation. 

Recall that if X is a geometric random variable with probability 
p, then P(X = k) is the probability that our process (or game) will 
terminate after k independent trials. An immediate consequence of this 
fact is that the conditional probabilities P(X = k+1|X > k) = p, 
and hence is independent of k. In other words, if we have managed 
to survive k trials, then the probability of dying on the k + 1-st trial 
depends only on the parameter p and not on k. Similarly, we see that 
P(X =k+7|X > k) will depend only on p and the integer 7, but not 
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on k. This says that during the game we don’t “age”; our probability 
of dying at the next stage doesn’t increase with age (k). 

We now turn this process into a continuous process, where we can 
die at any time t > 0 and not just at integer times. We want the process 
to enjoy essentially the same condition as the geometric, namely that 
if X now denotes the present random variable, then the conditional 
probability P(X = t+7]|X > t) should depend on 7 but not on t. 
In analogy with the above, this conditional probability represents the 
probability of living to time t +7, given that we have already lived t 
units of time. 

We let f represent the density function of X; the above requirement 
says that 


es NOUE 
[fleas 


We denote by F an antiderivative of f satisfying F(oo) = 0.!8 There- 
fore, 


= function of 7 alone . (x) 


and so F'(0) = —1. 
Next, we can write (*) in the form 


F(t+7) - F(t) 
FO) 


= g(T), 


F(t+7) 
F(t) 
depend on t. Therefore, the derivative with respect to t of this quotient 

is 0: 


for some function g. This implies that the quotient doesn’t 


F'(t+7)F(t) —F(t+r)F() 
F(t? 


= 0, 


forcing 


18Since 7 f(s) ds = 1, we see that F cannot be unbounded at oo. 
0 
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F'(t+7)F(t) = F(t+7)F'(t). 


But this can be written as 


d d 
—= ln? = — ln 
a nF(t+7) A n F(t), 


forcing 
F(t+7) =—-F(t)F(r), 
for all t and rT. Finally, if we differentiate both sides of the above with 
respect to t and then set t = 0, we arrive at 
F'(r) = —F'(0)F (7), 


which, after setting \ = F’(0), easily implies that F(t) = —e~™ for 
all t > 0. Since f is the derivative of F’, we conclude finally, that the 
density function of the exponential distribution must have the form 


FS Der 2S U 
Very easy integrations show that 


E(x) = ; and Var(X) = ~. 


The exponential distribution is often used in reliability engineering 
to describe units having a constant failure rate (i.e., age independent). 
Other applications include 


e modeling the time to failure of an item (like a light bulb; see 
Exercise 2, below). The parameter \ is often called the failure 
rate; 


e modeling the time to the next telephone call; 


e distance between roadkill on a given highway; 
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e number of days between accidents at a given intersection. 


EXERCISES 


1. Prove the assertions made above concerning the exponential ran- 
dom variable X with density f(t) = Ae", t > 0, viz., that 
E(X) =1/A and that Var(X) = 1/27. 


2. Suppose that the useful life JT of a light bulb produced by a partic- 
ular company is given by the density function f(t) = 0.01e~°™, 
where ¢t is measured in hours. Therefore, the probability that this 
light bulb fails somewhere between times ¢; and ty is given by the 


integral P(t; < T <t.) = i f(t) dt. 


(a) The probability that the bulb will not burn out before t hours 
is a function of t and is often referred to as the reliability of 
the bulb. 


(b) For which value of t is the reliability of the bulb equal to 1/2. 
Interpret this value of t. 


3. Suppose that your small company had a single secretary and that 
she determined that one a given day, the measured time elapsed 
between 30 consecutive received telephone calls was (in minutes) 


6.8, 0.63, 5.3, 3.8, 3.5, 7.2, 16.0. 5.5, 7.2, 1.1, 1.4, 1.8, 0.28, 1.2, 
1.6, 5.4, 5.4, 3.1, 1.3, 3.7, 7.5, 3.0, 0.03, 0.64, 1.5, 6.9, 0.01, 4.7, 
1.4, 5.0. 


Assuming that this particular day was a typical day, use these data 
to estimate the mean wait time between phone calls. Assuming 
that the incoming phone calls roughly follow an exponential dis- 
tribution with your estimated mean, compute the probability that 
after a given call your secretary will receive another call within 
two minutes. 


4. Under the same assumptions as in the above exercise, roughly how 
long will it take for the tenth call during the day will be taken by 
your secretary? 


362 


CHAPTER 6 INFERENTIAL STATISTICS 


. You have determined that along a stretch of highway, you see on 


average one dead animal on the road every 2.1 km. Assuming an 
exponential distribution with this mean, what is the probability 
that after seeing the last road kill you will drive 8 km before seeing 
the next one. 


. (Harder question) Assume, as in the above problem that you see on 


average one dead animal every 2.1 km along the above-mentioned 
highway. What is the probability that you will drive at least 10 
km before seeing the next two dead animals? (Hint: Let X1 be 
the distance required to spot the first roadkill, and let X» be the 
distance required to spot the second roadkill. You’re trying to 
compute P(X, + X > 10); try looking ahead to page 370.) 


. We can simulate the exponential distribution on a Tl-series calcu- 


lator, as follows. We wish to determine the transforming function 
g such that when applied to rand results in the exponential ran- 
dom variable Y with density function fy(t) = \e-*. Next, from 
page 353 we see that, in fact 


i fra de [ me de =o 1h), 


That is to say, 1 — e~** = g“1(2). 


= 
(a) Show that this gives the transforming function g(x) = —— In(1— 


A 
(b) On your TI calculators, extract 100 random samples of the 


exponential distribution with \ = .5 (so w = 2) via the com- 
mand 


—1 
~5 ia — rand(100)) > Ly. 


This will place 100 samples into the list variable Ly. 
(c) Draw a histogram of these 100 samples—does this look right? 


. (This is an extended exercise.) Continuing on the theme in Exer- 


cise 7, we can similarly use the TI calculator to generate samples 
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of a geometric random variable. Just as we were able above to 
transform rand into an exponential random variable, we shall (ap- 
proximately) transform the TI random integer variable “randInt” 
into a geometric random variable. 


First of all, the random integer generator has three inputs and has 
the form randInt(nyin, max, NV). The output consists of a se- 
quence of N randomly and uniformly distributed integers between 
Nmin 22d nmax- We shall, for convenience, take n,,j, = 1 and 
set 2 = Nmax. We let Y be the geometric random variable (with 
parameter p), and let X be a randomly-generated uniformly dis- 
tributed integer 1 < X <n. The goal is to find a function g such 
that Y = g(X). This will allow us to use the TI calculator to 
generate samples of a geometric random variable (and therefore of 
a negative binomial random variable). 


Note first that 


P(Y <k) = p+p(1—p)+p(1—p)*+---+p(1—p)""* = 1-(1-p)*, k > 0 


and that 


Le aS 1: 


) 


P(X<h) =" 
nv 


At this point we see a potential problem in transforming from the 
uniform variable to the geometric: the geometric random variable 
has an infinite number of possible outcomes (with decreasing prob- 
abilities) and the uniform random variable is an integer between 1 
and n. Therefore, we would hope not to lose too much information 
by allowing n to be reasonably large (n ~ 25 seems pretty good). 
At any rate, we proceed in analogy with the analysis on page 353: 
assuming that Y = g(X) we have 
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1—(1—p)* = P(Y <k) 


Solving for g we get 


h 
g(h) = = ), 
n(1 — p) 

However, we immediately see that if h = n = nmax we see that 
g(h) is undefined (and the TI will generate an error whenever 
h =n). This makes mathematical sense as there is no value of 
k for which P(Y < k) = 1. One remedy is to let the value of n 
in the above expression for g be slightly larger than nmax. Of 
course, having done this, the transformed values will no longer be 
integers, so we’ll need to round to the nearest integer. On the TI 
calculator the value int(x + .5) will have the effect of rounding to 
the nearest integer. 


NOw DO THIS: Generate 100 samples of the geometric random 
variable with parameter p = .25 using the command 


randInt(1,30,100) ) 
In (1 30.01 


In(1 — .25) 


> Ty 


followed by the command 


int (Ly + 5) > Ly. 


This will store 100 randomly-generated integer samples in the list 
variable L,;. You should check to see if they appear to follow 
the geometric distribution with parameter p = .25. (Start by 
comparing the mean of your data with the theoretical mean of the 
geometric random variable! ) 
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9. Let X be an exponential random variable with failure rate A, and 
let Y = X'/°, a> 0. Using the idea developed on page 353, com- 
pute the density function for Y. This gives the so-called Weibull 
distribution. 


6.3. Parameters and Statistics 


Suppose that we have a continuous random variable X having density 
function fx. Associated with this random variable are a few parame- 
ters, the mean (and also the median and the mode) and the vari- 
ance of X. In analogy with discrete random variables they are defined 
as follows. 


Mean of X. We set 


BX) = jig = [ 2fx(2) de. 


Median of X. This is just the half-way point of the distribution, 


that is, if m is the median, we have P(X <_m) = 4 = P(X >m). 


In terms of the density function, this is just the value m for which 


1 


[. fx(@) de 5, 


Mode of X. This is just the value of x at which the density function 
assumes its maximum. (Note, then, that the mode might not be 
unique: a distribution might be “bimodal” or even “multimodal.” ) 


The mean, median, and mode measure “central tendency.” 
Variance of X. We set 
Var(X) = of = E((X ~px)’). 


As we shall see below, 
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Var(X) = [ — px)’ fx (x) de. 
(though most texts gloss over this point). 


The positive square root ax of Var(X) is called the standard devia- 
tion of X. 


6.3.1 Some theory 


If we adapt the arguments beginning on page 319 to continuous random 
variables, we can give a heuristic argument that the expectation of the 
sum of two continuous random variables is the sum of the expectations. 
The basic idea is that there is a joint density function fxyy which gives 
probabilities such as 


b rd 
Pla<X<bande<¥<ad)=/ | fxy(x,y) dedy. 


These can be represented in terms of conditional probabilities in the 
usual way: fxy(x,y) = fx(aly)fy(y); furthermore, one has 


fx(x) = | fr(ely) dy. 
Accepting all of this stuff, one proceeds exactly as on pages 319-320: 
Ux+y = pas | (@ +9) fxv(z,y) dx dy 
= | ea ie tfxy(x,y) dx dx +f [ ufsy(e.y) dx dy 
= [~ f° cfs fry)dydet [~ [~ yfrlule) fx(@) de dy 


= f° xfx(a)de+ [ yfr(y)dy 
[Lx + Ly. 


A similar argument, together with mathematical induction can be 
used to show that 
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E(X, + Xot--- + X,) = E(X1) + E(Xo) +--+ + EX) |. 


If the random variables _X and Y are independent, then we may write 


the density function fxy(z,y) as a product: fxy(2,y) = fx(x)fr(y), 
from which it follows immediately that 


E(XY) = E(X)E(Y), where X and Y are independent. 


In particular, this shows the following very important result. Assume 
that we are to take n independent samples from a given population 
having mean p. If X denotes the average of these samples, then X is a 
itself a random variable and 


am y+ X94 


) 


n 


where X 1, Xo, ..., Xp, are independent random variables from this pop- 
ulation. We have, therefore, that 


B(X) = E(X1) ues CE aa, 


We now turn our attention to variance. However, a couple of pre- 
liminary observations are in order. First of all, let X be a continuous 
random variable, let a be a real constant, and set Y = X +a. We 
wish first to compare the density functions fy and fx. Perhaps it’s 
already obvious that fy(x) = fx(x — a), but a formal proof might be 
instructive. We have 


|). fv) de = PW <t) = P(Xta<t) = P(X <t-a)= [“ fx(o)a 


But a simple change of variable shows that 
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t—a 
i f(a = fx(a — a) dz. 
In other words, for all real numbers t, we have 
t t 
[fran =f  fx(v—a)de 
This implies (e.g., by the Fundamental Theorem of Calculus) that 
fx+a(t) = fx(x—a) (*) 
for alla € R. 
Next, we would like to compute the density function for the random 


variable Y = X? in terms of that of X. To do this, note that 


[ fola)de = P(X? <t) = P-VE<X <vi) = [Ui Fo) ae 


An application of the Fundamental Theorem of Calculus gives 


fea) = fx(/x) — fx(-V2)). (*x) 


—*_( 
2/x 
Using equations (*) and (**), we can compute the variance of the con- 
tinuous random variable X having mean j, as follows. We have 
Var(X) = B((X —p)%) 
vf tf x—y2(x) dx 
vce 1 fo 
= aN Vit fx— pla) dx = al Va fx—p(—V2) dx 
U=V/x 0 oe) 
ve i u? fx_—,(u) )dut fo u* fx_,(u) du = ce u? fx—,(u) du 
[whew +p) du 
= f° (u-p)fx(u) du 
= [ (e—H) fx(@) de 


SECTION 6.3 PARAMETERS AND STATISTICS 369 
This proves the assertion made on page 365. Next, we have 

p * fx2(x) dx 

= 5 ff Vefs(o)de — 5 [ve fx(-v0) dz 

a i u’fx(u) dx 

= ee x” fx (x) dx 


BOS) 


Finally, 


Var(X) = f° (@—p)*fx(e) de 
= f(a? - 20+ pw?) fx(o) de 
= [ wfx(a)de- yf wfx(z)de+p? [ fx(x) de 
= E(X*)-p’, 


exactly as for discrete random variables (page 321). 


We need one final theoretical result concerning variance. Assume 
that we take two independent measurements X and Y from a given 
population both having mean p. What is the variance of X + Y? This 
will require the two results sketched on page 366, namely 


(i) that F(X + Y) = E(X)+ E(Y), whether or not X and Y, and 
(ii) that if X and Y are independent, then E( XY) = E(X)E(Y). 


Using these two facts, we can proceed as follows: 


Var(X +Y) = E((X+Y —2p)”) 
= E(X?+Y*+2XY — 4uX — 4pY + 4p’) 
= E(X*)+ E(V?) + 2p? — 4p? — 4y? + 4p? 
= E(X*)=w + EV’) -—w = Var(X)-+ Var(Y). 
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Convolution and the sum of independent random variables. Assume that 
X and Y are independent random variables with density functions fx 
and fy, respectively. We shall determine the distribution of X + Y in 
terms of fy andfy. 


To do this we observe that 
d 
fxay(t) = at (xX a) 
d 
= Ply <4 Xx 
dt St ) 
d le) t-2x 
= of od, fx) fyry) dy ae 
_ os fx(x)fy(t — x) da. 


The last expression above is called the convolution of the density 
functions.'? We write this more simply as 


fxsy(t) = fx * fy), 


where for any real-valued” functions f and g, the convolution is defined 
by setting 


= [ #@) g(t —x)d 


From the above we can easily compute the distribution of the differ- 
ence X — Y of the independent random variables X and Y. Note first 
that the distribution of —Y is clearly the function f_y(t) = fy(—t), t € 
R. This implies that the distribution of fx_y is given by 


fx_y(t) = fx * f_y( = | tx AN eg ae de | flo \fv(e—t)dz 


19Of course, the notion of convolution was already introduced in Exercise 5 on page 261. 
20 Actually, there are additional hypotheses required to guarantee the existence of the convolution 
product. 


SECTION 6.3 PARAMETERS AND STATISTICS 371 


Next, continuing to assume that X an Y are independent random 
variables, we proceed to compute E(X + Y). We have 


B(X+Y) = fo a(fx * fy)(2) de 
= f[ of” fx@frle-t)dtde 
= vie fx( Dy. x fy(x — t) dt dx 
ye fixc( aes (t + x) fy (x) de dt 
= [ fx(®)+ E(Y)) dt 
= E(X)+E(Y). 
EXERCISES 


1. Compute the mean and the variance of the random variable rand. 
(Recall that rand has density function 


f(z) = 


1 if0<2<1, 
0 otherwise.) 


2. Compute the mean and the variance of the random variable vy rand. 
(Recall that Vrand has density function 


2a. AO a 1s 


0 otherwise.) 


re) =| 


3. Compute the mean and the variance of the random variable rand’. 
(See Exercise 7 on page 356.) 


4. Compute the mean and the variance of the random variable having 
density function given in Exercise 6 on page 355. 
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. In Exercise 6 on page 362 you were asked essentially to investigate 


the distribution of X, + X» where X, and X» were independent 
exponential random variables, each with mean yp = 1/A. Given 
that the density function of each is f(2) = Ae~*” and given that the 
sum has as density the convolution of f with itself (see page 370), 
compute this density. 


. In Exercise 8 on page 356 we showed that the density function 


for the y? random variable with one degree of freedom is f(x) = 


Us eye io ees 
——— 7 e */*. Using the fact that the y? with two degrees of 
Jan g XxX g 


freedom is the sum of independent y? random variables with one 
degree of freedom, and given that the density function for the 
sum of independent random variables is the convolution of the two 
corresponding density functions, compute the density function for 
the x? random variable with two degrees of freedom. (See the 
footnote on page 356.) 


. Let f be an even real-valued function such that [ ~ f(x)dz exists. 


Show that f * f is also an even real-valued function. 


. Consider the function defined by setting 


0 otherwise. 


re) =| 


(a) Show that 


xz+1 
a yay) dy. af —o1<¢<0, 


a= [. vle-yP ay if0<e<l. 
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0.5- 

(b) Conclude that f * f is not y=f*F(x) 
differentiable at « = 0. 

(c) Show that the graph of 

f x f is as depicted to the 

right. | 2 | i | 9 


(d) Also, compute ne f « f(x)dz, and compare with he f(x)dx. 
Does this make sense? Can you formulate a general statement? 


6.3.2 Statistics: sample mean and variance 


In all of the above discussions, we have either been dealing with ran- 
dom variables whose distributions are known, and hence its mean and 
variance can (in principle) be computed, or we have been deriving the- 
oretical aspects of the mean and variance of a random variable. While 
interesting and important, these are intellectual luxuries that usually 
don’t present themselves in the real world. If, for example, I was 
charged with the analysis of the mean number of on-the-job injuries 
in a company in a given year, I would be tempted to model this with 
a Poisson distribution. Even if this were a good assumption, I proba- 
bly wouldn’t know the mean of this distribution. Arriving at a “good” 
estimate of the mean and determining whether the Poisson model is a 
“good” model are both statistical questions. 

Estimation of a random variable’s mean will be the main focus of the 
remainder of the present chapter, with a final section on the “goodness 
of fit” of a model. 


We turn now to statistics. First of all, any particular values (out- 
comes) of a random variable or random variables are collectively known 
as data. A statistic is any function of the data. Two particularly 
important statistics are as follows. A sample (of size n) from a distri- 
bution with random variable X is a set of n independent measurements 
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L1,%2,...,%y Of this random variable. Associated with this sample are 


The sample mean: this is defined by setting 


7 Uy tXea+:++:+IXy 


n 


The basic reason for considering the sample mean is the follow- 
ing. Suppose that we have taken the samples x1, %2, ..., %, from 
a population whose mean is pz. Would we expect that © p? 
Fortunately, the answer is in agreement with our intuition; that 
we really do expect that the sample mean to approximate the 
theoretical (or population) mean. The reason, simply is that if we 
form the random variable 


Ayah 2p steak, 


) 


XS 


n 


then it is clear that E(X) = wu. (Indeed, we already noted this 
fact back on page 324.) That is to say, when we take n indepen- 
dent samples from a population, then we “expect” to get back the 
theoretical mean p. Another way to state this is to say that Z is 
an unbiased estimate of the population mean p. 


Next, notice that since X,, X9,--- , X,, are independent, we have 
that 


Var(X) = Var ( aden , 
n 
1 
as oa Var( A t Xg+---+ Xp) 
1 
= 7a (var(X1) + Var(X2) + +--+ Var(X,)) 
2 
or 


= . 
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This shows why it’s best to take “large” samples: the “sampling 
statistic” X has variance which tends to zero as the sample size 
tends to infinity. 


The sample variance: this is defined by setting 


1 


. —\2 
Xi aX). 
Tei ) 


Qo es 
Ss, = 


The sample standard deviation s, = \/s?. 


If X1, Xo, ...,X, represent independent random variables having the 
same distribution, then setting 


is a random variable. Once the sample has been taken, this random 
variable has taken on a value, S, = s, and is, of course, no longer 
random. The relationship between S, and s, is the same as the rela- 
tionship between X (random variable before collecting the sample) and 
& (the computed average of the sample). 

You might wonder why we divide by n — 1 rather than n, which 
perhaps seems more intuitive. The reason, ultimately, is that 


B(S?) = EB (= Oe -x)) a5 


n-1lig 


A sketch of a proof is given in the footnote.24_ (We remark in passing 


that many authors do define the sample variance as above, except that 


21 First of all, note that, by definition 


from which it follows that 


Now watch this: 
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the sum is divided by n instead of n — 1. While the resulting statis- 
tic is a biased estimate of the population variance, it does enjoy the 
property of being what’s called a maximum-likelihood estimate of 
the population variance. A fuller treatment of this can be found in any 
reasonably advanced statistics textbook. ) 


Naturally, if we take a sample of size n from a population having 
mean jp and variance o?, we would expect that the sample mean and 
variance would at least approximate pu and o”, respectively. In practice, 
however, given a population we rarely know the population mean and 
variance; use use the statistics 7 and s2 in order to estimate them (or 
to make hypotheses about them). 


= SX -X)+n(X- py)? (since °(X;- X) =0.) 


i=1 


Next, since E(X) = p, we have E (n(X — y)?) = nE ((X — p)?) = nVar(X) = o?. Therefore, we 
take the expectation of the above random variables: 


no? = E (so — “) 


= EF (so — X)? + n(X - “) 
= E (soe — x) + E (n(X - 1)”) 
= E (so —X)?) +0? 


from which we see that 
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6.3.3. The distribution of X and the Central Limit Theorem 


The result of this section is key to all of sampling theory. As we might 
guess, one of the most important statistics we’re apt to encounter is 
the mean & of n independent samples taken from some population. 
Underlying this is the random variable X with parameters 


2 
E(X) = p, and Var(Xx) = = 
n 


Let’s start by getting our hands dirty. 
SIMULATION 1. Let’s take 100 samples of the mean (where each mean 


is computed from 5 observations) from the uniform distribution having 
density function 


1 if0<2<1, 


0 otherwise. 


We display the corresponding histogram: 


20 | f 

6 100 samples 
of the mean 

10, = =S5 


01 O02 03 04 05 06 07 08 O09 1 
SIMULATION 2. Here, let’s take 100 samples of the mean (where each 
mean is computed from 50 observations) from the uniform distribution 
above. The resulting histogram is as below. 
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25 f 


100 samples 
of the mean 
n= 50 


%3 0.4 0.5 0.6 0.7 


There are two important observations to make here. First of all, even 
though we haven’t sampled from a normal distribution, the sample 
means appear to be somewhat normally distributed (more so in the 
n = 50 case). Next, notice that the range of the samples of the mean 
for n = 50 is much less than for n = 5. This is because the standard 
deviations for these two sample means are respectively TE and Tee 


where o is the standard deviation of the given uniform distribution.” 


SIMULATION 3. Let’s take 100 samples of the mean (where each mean 
is computed from 5 observations) from the distribution having density 
function 


Pe i ee L, 


0 otherwise. 


(Recall that this is the density function for Vrand.) We display the 
corresponding histogram. 


?2The variance of this distribution was to be computed in Exercise 1 on page 371; the result is 
2_ 1 
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25) f 


100 samples 
of the mean 
n=5 


% 3 0.4 0.5 0.6 0.7 0.8 0.9 1 


SIMULATION 4. Let’s take 100 samples of the mean (where each mean 
is computed from 50 observations) from distribution having the same 
density function as above. 


We display the corresponding histogram. 
f 


100 samples 
25° of the mean 
n= 50 


8.62 0.64 0.66 0.68 0.7 0.72 
Again, note the tendency toward a normal distribution with a rela- 
tively narrow spread (small standard distribution). 
The above is codified in the “Central Limit Theorem:” 


Central Limit Theorem. The sample mean X taken from n samples 
of a distribution with mean js and variance o? has a distribution which 
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as n —> co becomes arbitrarily close to the normal distribution with 


mean [4 and variance —. 
n 


Perhaps a better way to state the Central Limit Theorem is as fol- 
lows. If Z is the normal random variable with mean 0 and variance 1, 
then for any real number z, 


X — 
lim p(a < :) SIP CA <2) 


6.4 Confidence Intervals for the Mean of a Popu- 
lation 


A major role of statistics is to provide reasonable methods by which 
we can make inferences about the parameters of a population. This 
is important as we typically never know the parameters of a given 
population.?? When giving an estimate of the mean of a population, one 
often gives an interval estimate, together with a level of confidence. 
So, for example, I might collect a sample from a population and measure 
that the mean Z% is 24.56. Reporting this estimate by itself is not terribly 
useful, as it is highly unlikely that this estimate coincides with the 
population mean. So the natural question is “how far off can this 
estimate be?” Again, not knowing the population mean, this question is 
impossible to answer. In practice what is done is to report a confidence 
interval together with a confidence level. Therefore, in continuing 
the above hypothetical example, I might report that 


“The 95% confidence interval for the mean is 24.56 + 2.11.” 
or that 


“The mean falls within 24.56 + 2.11 with 95% confidence.” 


?3Tn fact we almost never even know the population’s underlying distribution. However, thanks 
to the Central Limit Theorem, as long as we take large enough samples, we can be assured of being 
“pretty close” to a normal distribution. 


SECTION 6.4 CONFIDENCE INTERVALS 381 


A very common misconception is that the above two statements 
mean that the population mean lies within the above reported inter- 
val with probability 95%. However, this is meaningless: either the 
population mean does or doesn’t lie in the above interval, there is no 
randomness associated with the interval reported! As we'll see, the 
randomness is associated with the process of arriving at the interval 
itself. If 100 statisticians go out and compute 95% confidence intervals 
for the mean, then roughly 95 of the computed confidence intervals will 
actually contain the true population mean. Unfortunately, we won’t 
know which ones actually contain the true mean! 


6.4.1 Confidence intervals for the mean; known population 
variance 


While it is highly unreasonable to assume that we would know the 
variance of a population but not know the mean, the ensuing discussion 
will help to serve as a basis for more practical (and realistic) methods 
to follow. Therefore, we assume that we wish to estimate the mean p of 
a population whose variance o? is known. It follows then, that if X is 
the random variable representing the mean of n independently-selected 
samples, then 


2 
: Se tien OO. 

e the variance of X is —, and 
n 


e (ifn is “large” )*4 the random variable X is approximately normally 
distributed. 


In the ensuing discussion, we shall assume either that we are sam- 
pling from an (approximately) normal population or that n is relatively 
large. In either case, X will be (approximately) normally distributed. 
We have that 


2 
E(X) =p, and Var(X) = as 
n 


?4 4 typical benchmark is to use sample sizes of n > 30 in order for the normality assumption to 
be reasonable. On the other hand, if we know—or can assume—that we are sampling from a normal 
population in the first place, then X will be normally distributed for any n. 
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therefore the random variable Z = = is normally distributed with 
mean 0 and standard deviation 1. The values z + +£1.96 are the values 
such that a normally-distributed random variable Z with mean 0 and 


variance 1 will satisfy P(—1.96 < Z < 1.96) = 0.95; see figure below 
0.5 


Normal Density Curve 


Area = 0.025 Area = 0.025 


In other words, we have 


=p 


P(-1.96 < < 1.96) = 0.95. 


oh/n 


We may rearrange this and write 


PX S106 2 pe X P06 
nm 


Vi Va 


Once we have calculated the mean Z of n independent samples, we 


obtain a specific interval ce 1.96 . ,&— 1.96 = which we call the 
n Jn 


95% confidence interval for the mean p of the given population. 
Again, it’s important to realize that once the sample has been taken 
and x has been calculated, there’s nothing random at all about the 
above confidence interval: it’s not correct that it contains the true 
mean 4 with probability 95%, it either does or it doesn’t! 


Of course, there’s nothing really special about the confidence level 
95%—it’s just a traditionally used one. Other confidence levels fre- 
quently used are 90% and 99%, but, of course, any confidence level 
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could be used. To form a 90% confidence interval from a measured 
mean X, we would replace the number 1.96 used above with the value 
of z for which random samples from a normal population with mean 0 
and standard deviation 1 would lie between +z 99% of the time. Here, 
it turns out that z = 2.58: 

0.5 


Normal Density Curve 


Area = 0.005 Area sees 


-0.1 
In general, the (1 — a) x 100% confidence interval for the mean is 


obtained by determining the value z,/2 such that a normally-distributed 
random variable Z of mean 0 and standard deviation 1 will satisfy 


P(—2Zajo < ZS ajo) = 1-a. 


Below are tabulated some of the more traditional values: 


Confidence Relevant 
Level z-value 
(1 — a) Qa Ze/2 
0.90 0.10 1.645 
0.95 0.05 1.960 
0.98 0.02 2.326 
0.99 0.01 2.576 


In summary, the (1 — a) x 100% confidence interval for the mean is 
formed from the sample mean % by constructing 
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xv , 2+ 2q/ 


o o 
Furthermore, we expect that (1 — a) x 100 percent of the intervals so 
constructed will contain the true population mean. 


Note, finally, that as the confidence level rises, the width of the 
confidence interval also increases. This is obvious for the wider the 
interval, the more confident we should be that it will “capture” the 
true population mean! 


EXERCISES 


1. Suppose that we are going to sample the random variable X = 
4 x rand. Is this a normal random variable? What is the mean 
and variance of X? Suppose that we instead sample X, where 
X = 4xrand and X is computed by taking 50 independent samples 
and forming the average. Is X close to being normally distributed? 
To help in answering this question, write the simple TI code into 
your calculator 


PROGRAM: NORMCHECK 
{0} > Ly 
:For(I,1,100) 
:-4*rand(50)—> Le 
‘mean(L2) > Ly (I) 
“END 


A moment’s thought reveals that this program collects 100 samples 
of X, where each mean is computed from 50 samples each and 
putting the result into list variable L,. Finally draw a histograms 
of these 100 samples of the mean; does it look normal? This 
little experiment is tantamount to sending out 100 statisticians 
and having each collecting 50 independent samples and computing 
the mean. The statisticians all return to combine their results into 
a single histogram. 
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2. Suppose that you go out and collect 50 samples of the random 
variable 4 x rand and compute the mean 7. Compute the 95% 
confidence interval so obtained. Does it contain the true mean 1? 
(See Exercise 1, above.) 


3. We can build on Exercise 2, as follows. The following simple TI 
code can be used to count how many out of 100 95% confidence 
intervals for the mean p of the random variable 4*rand will actually 
contain the true mean (= 2): 


PROGRAM: CONFINT 
0-C 
:For(I,1,100) 
‘A*rand(50)— Ly 
:mean(L,) — M 
M — 32 5 2 
iM + .32 — U 
C+(L<2)(2<U)3C 
-END 
:Disp C 
‘Stop 


(a) What is the number .32? 
(b) What is C trying to compute? 


(c) Run this a few times and explain what’s going on. 


6.4.2 Confidence intervals for the mean; unknown variance 


In this section we shall develop a method for finding confidence intervals 
for the mean yz of a population when we don’t already know the variance 
o” of the population. In the last section our method was based on the 


fact that the statistic - was approximately normally distributed. 


In the present section, since we don’t know a, we shall replace o? with 
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its unbiased estimate s?, the sample variance. We recall from 
page 375 that s? is defined in terms of the sample by setting 


2 1 


S = 


x (2; 7 


n 
me Tey 


Again, this is unbiased because the expected value of this statistics is 
the population variance o? (see the footnote on page 375. 


We now consider the statistic T’ = which takes on the value 


eee 
SiA/n 


t from a sample of size n. Of course, we don’t know yp, 


= Ep 
Se A/n 
but at least we can talk about the distribution of this statistic in two 
important situations, viz., 


e The sample size is small but the underlying population being sam- 
pled from is approximately normal; or 


e The sample size is large (n > 30). 


In either of the above two situations, T is called the t statistic and 
has what is called the t distribution with mean 0, variance 1 and 
having n — 1 degrees of freedom. If n is large, then T has close to a 
normal distribution with mean 0 and variance 1. However, even when 
n is large, one usually uses the ¢ distribution.?° 

Below are the density functions for the ¢t distribution with 2 and 10 
degrees of freedom (DF). As the number of degrees of freedom tends 
to infinity, the density curve approaches the normal curve with mean 0 
and variance 1. 


*5Before electronic calculators were as prevalent as they are today, using the ¢ statistic was not 
altogether convenient as the t distribution changes slightly with each increased degree of freedom. 
Thus, when n > 30 one typically regarded T as normal and used the methods of the previous section 
to compute confidence intervals. However, the ¢ distribution with any number of degrees of freedom 
is now readily available on such calculators as those in the TI series, making unnecessary using the 
normal approximation (and introducing additional error into the analyses). 
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05 
Normal density curve Normal Density Curve 
versus density curves mean = 0, variance =1 
for t distribution noes 
(2 OF and 10 BF) 0.3 
02 . 
| 2 DF 
10 DF 
0.1 
-1.96 196 
-0,1 


The philosophy behind the confidence intervals where o is unknown 
is pretty much the same as in the previous section. We first choose a 
desired level of confidence (1 — a) x 100% and then choose the appro- 
priate level ty/2 which contains a x 100% of the population in the two 
tails of the distribution. Of course, which t¢ distribution we choose is 
dependent on the size of the sample we take; as mentioned above, the 
degrees of freedom is equal to n — 1, where n is the sample size. These 
levels are tabulated in any statistics book; as a sample we show how 
they are typically displayed (a more complete table is given at the end 
of this chapter): 


Degrees of 
Freedom — tos50 | t.o25 | t.005 


10 1,812 2,228 |-3.169 


11 1.796 | 2.201 | 3.106 
12 1.782 | 2.179 | 3.055 


13 1.771 | 2.160 | 3.012 


Once we have collected the sample of size n and have computed the 
average X of the sample, the (1—«@) x 100% confidence interval becomes 
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EXERCISE 


1. As we have already seen it’s possible to use your TI calculators to 
generate examples (simulations) of your own, as follows. Try the 
following. 


(a) On your TI, invoke randNorm(10.38, 2.4,5) — L,. What does 
this command do?”6 


(b) Next, use your TI calculator to compute a 95% confidence 
interval for the mean. (Use TInterval and run the Data option) 


(c) Did this interval capture the true mean? 


(d) If you were to perform this experiment 100 times, how many 
times would you expect the computed confidence interval to 
capture the true mean? 


(e) Here’s a code that will construct the above confidence interval 
and compute the number of times it captures the true mean. 
Run it and report on your findings. (The run time for this 
program on a TI-83 is about four minutes.) 


61, generates a (small) sample of size 5 taken from a normal population with mean 10.3 and 
standard deviation 2.4. 
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PROGRAM: CONFINT1 
0-C 
‘Input ”POP MEAN ”, M 
‘Input "POP STD”, 5 
‘Input ”"NO OF EXPER ”, N 
3 > Kk 
:For(I,1,100) 
:randNorm(M,S,K)—> Ly, 
rmean(L1) + X 
:(K/(K-1))(mean(L?) — X”) = V 
:2.776/V/K > Q 
X-Q->L 
X+Q-U 
C+(L<2)(2<U)3C 
“END 
:Disp C 
‘Stop 


(f) In the above program, change the commands as follows 


Input "POP MEAN”,M to 1/3—~M 
randNorm(M,S,K) > L, to rand(K)? > Ly; 
Input "POP STD”, S to anything (it’s now irrelevant) 


Notice that this time we are taking small samples from a 
highly non-normal population (rand?). Are we still capturing 
the true mean (= 1/3) roughly 95% of the time? 


6.4.3. Confidence interval for a population proportion 


Professional pollsters love to estimate proportions: in any political race 
and at virtually any time, they will take samples from the voting pop- 
ulation to determine whether they prefer candidate A or candidate B. 
Of course, what the pollsters are trying to determine is the overall 
preference—as a proportion—of the entire population. I seem to re- 
member reading sometime during the 2004 U.S. presidential campaign 
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that a Gallop Poll survey of 10,000 voters led to the prediction that 
51% of the American voters preferred Kerry over Bush with a sam- 
pling error of +3% and a confidence level of 95%. What this means, of 
course, is the essence of confidence intervals for proportions. 


The methods of this section are based on the assumption that large 
enough samples from an even larger binomial population are taken so 
that the test statistic—the sample proportion—can assumed to be nor- 
mally distributed. Thus, we are going to be sampling from a very large 
binomial population, i.e., one with exactly two types A and B. If the 
population size is N, then the population proportion p can be then 
defined to be fraction of those of type A to N. When sampling from 
this population, we need for the population size to be rather large com- 
pared with the sample size. In practice, the sampling is typically done 
without replacement which strictly speaking would lead to a hypergeo- 
metric distribution. However, if the population size is much larger than 
the sample size, then the samples can be regarded as independent of 
each other, whether or not the sampling is done without replacement. 
Once a sample of size n has been taken, the sample proportion / is 
the statistic measuring the ratio of type A selected to the sample size 
n. 


Assume, then, that we have a large population where p is the pro- 
portion of type A members. Each time we randomly select a member 
of this population, we have sampled a Bernoulli random variable B 
whose mean is p and whose variance is p(1 — p). By the Central Limit 
Theorem, when n is large, the sum B, + B)+---+B, of n independent 
Bernoulli random variables, each having mean p and variance p(1 — p) 
has approximately a normal distribution with mean np and variance 
np(1 — p). The random variable 


P= 1+ B24 


n 


is therefore approxmately normally distributed (when n is large) and 


A ps 
has mean p and variance pp) 
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If by, bo,...,b, are the observed outcomes, i.e., 


by = 


1 if type A is observed; 
0 if type B is observed, 


then the relevant test statistic is 


—— by tbo +---+ by, 


n 


Notice that since we don’t know p (we’re tying to estimate it), we 
know neither the mean nor the variance of the test statistic. With 
a large enough sample, P will be approximately normally distributed 

—? will be 
p-p)/n 
approximately normal with mean 0 and variance 1. The problem with 
the above is all of the occurrences of the unknown p. The remedy is to 


— 
A a) by the sample variance based on p: 
n 


with mean p and variance p(1 — p). Therefore 


approximate the variance 
(1 — 9 
la Therefore, we may regard 
n 
Pep 
VP(1— P)/n 


as being approximately normally distributed with mean 0 and variance 
1. Having this we now build our (1 — a) x 100% confidence intervals 
based on the values 2,/2 taken from normal distribution with mean 0 
and variance 1. That is to say, the (1 — a) x 100% confidence interval 
for the population proportion p is 


, POULAA ps «. (Dip 
P— Za/2 me Faas me?) 


CAUTION: If we are trying to estimate a population parameter which 
we know to be either very close to 0 or very close to 1, the method 
above performs rather poorly unless the sample size is very large. 
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The reason is the highly skewed nature of a binomial population with 
parameter p very close to either 0 or 1, meaning that the Central Limit 
Theorem will need much larger samples before the distribution starts 
to become acceptably normal. A proposed modification?’ is to replace 


ao ; one ok 
p in the above interval by the new statistic p* = ed where x is the 
n 


measured number of type A members in the sample and n is the sample 


1_* 
size. Also, the sample standard deviation ne) is replaced by 


*(1 — * 
the expression aCe a The resulting confidence interval performs 


better for the the full range of possibilities for p, even when n is small! 


RULE OF THUMB: Since the methods of this section rely on the test 


P 
statistics —= a being approximately normally distributed, any 
VP(1-P)/n 


sort of guidance which will help us assess this assumption will be help- 
ful. One typically used one is that if the approximate assumption of 
normality is satisfied, then p + three sample standard deviations should 
both lie in the interval (0,1). Failure of this to happen indicates that 
the sample size is not yet large enough to counteract the skewness in the 
binomial distribution. That is to say, we may assume that the methods 
of this section are valid provided that 


pl — p) 
nN 


0<p+3 24, 


6.4.4 Sample size and margin of error 


In the above discussions we have seen the our confidence intervals had 
the form 


Estimate + Margin of Error 


at a given confidence level. We have also seen that decreasing the 


27See Agresti, A., and Coull, B.A., Approximate is better than ‘exact’ for interval estimation of 
binomial proportions, THE AMERICAN STATISTICIAN, Vol. 52, N. 2, May 1998, pp. 119-126. 
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margin of error also decreases the confidence level. A natural question 
to ask is whether we can decrease the margin of error without at the 
same time sacrificing confidence? The answer is yes: by increasing the 
sample size. We flesh this out in the following example. 


EXAMPLE. Suppose that we are interested in the average cost jz of a 
new house in the United States in 1966, and that a random selection 
of the cost of 50 homes revealed the 95% confidence interval 


$20,116 <p < $30,614, 


along with the sample mean x & $25,365, and estimate 0 ® s, = 
$18,469. If we use this as an estimate of the population standard 
deviation o, then we see that a (1 — a) x 100% confidence interval 
becomes 


z a, eee 
E— Zoe SUS E+ pos. 
/2 A bb /2 Jn 
We see also that the margin of error associated with the above estimate 
is one-half the width of the above interval, viz., $5,249. 


QUESTION: Suppose that we wish to take a new sample of new houses 
and obtain a confidence interval for jz with the same level of confidence 
(95%) but with a margin of error of at most $3, 000? 


SOLUTION. This is easy, for we wish to choose n to make the margin 
of error no more than $3, 000: 


“05 < $3,000. 


Using z925 = 1.96 and a & $18, 469 we quickly arrive at 


(= x 18, 469 
wes 

- 3, 000 
That is to say, if we take a sample of at least 146 data, then we will have 


narrowed to margin of error to no more than $3, 000 without sacrificing 
any confidence. 


2 
ey 146. 
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We can similarly determine sample sizes needed to a given bound on 
the margin of error in the case of confidence intervals for proportions, 
as follows. In this case the margin of error for a confidence interval 
pp) 


with confidence (1 — a) x 100% is Za/2 , where, as usual, p 


is the sampled population proportion. A very useful approximation is 
obtained by noting that since 0 < p < 1, then 0 < p(1— pp) < ‘ 
Therefore, if we wish for the margin of error to be less than a given 
bound B, all we need is a sample size of at least 


“aia 
> 
ve Ge 


because regardless of the sampled value p we see that 


Za/2 > Za/2V P(1 — p) 
2D B 


EXERCISES 


1. Assume that we need to estimate the mean diameter of a very 
critical bolt being manufactured at a given plant. Previous studies 
show that the machining process results in a standard deviation 
of approximately 0.012 mm. Estimate the sample size necessary 
to compute a 99% confidence interval for the mean bolt diameter 
with a margin of error of no more than 0.003 mm. 


2. Assume that a polling agency wishes to survey the voting public to 
estimate the percentage of voters which prefer candidate A. What 
they seek is a sampling error of no more than .02% at a confidence 
level of 98%. Find a minimum sample size which will guarantee 
this level of confidence and precision. 


6.5 Hypothesis Testing of Means and Proportions 


Suppose we encounter the claim by a manufacturer that the precision 
bolts of Exercise 1 above have a mean of 8.1 mm and that we are 
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to test the accuracy of this claim. This claim can be regarded as a 
hypothesis and it is up to us as statisticians to decide whether or 
not to reject this hypothesis. The above hypothesis is usually called 
the null hypothesis and is an assertion about the mean p about the 
population of manufactured bolts. This is often written 


Ho: aoe 


We have no a priori reason to believe otherwise, unless, of course, we 
can find a significant reason to reject this hypothesis. In hypothesis 
testing, one typically doesn’t accept a null hypothesis, one usually 
rejects (or doesn’t reject) it on the basis of statistical evidence. 


We can see there are four different outcomes regarding the hypothesis 
and its rejection. A type I error occurs when a true null hypothesis 
is rejected, and a type II error occurs when we fail to reject a false 
null hypothesis. These possibilities are outlined in the table below. 


Ho is true Ho is false 
Reject Ho Type lerror | Correct decision 
Do not reject Hop | Correct decision | Type II error 


Perhaps a useful comparison can be made with the U.S. system of 
criminal justice. In a court of law a defendent is presumed inno- 
cent (the null hypothesis), unless proved guilty (“beyond a shadow of 
doubt”). Convicting an innocent person is then tantamount to making 
a type I error. Failing to convict an guilty person is a type II error. 
Furthermore, the language used is strikingly similar to that used in 
statistics: the defendent is never found “innocent,” rather, he is merely 
found “not guilty.” 


It is typical to define the following conditional probabilities: 


a = P(rejecting Ho| Ho is true), 
b = P(not rejecting Ho | Hp is false). 


Notice that as ~ becomes smaller, 6 becomes larger, and vice versa. 
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Again, in the U.S. judicial justice system, it is assumed (or at least 
hoped ) that a is very small, which means that 6 can be large (too 
large for many people’s comfort). 


Let’s move now to a simple, but relatively concrete example. Assume 
that a sample of 60 bolts was gathered from the manufacturing plant 
whose claim was that the bolts they produce have a mean diameter of 
8.1 mm. Suppose that you knew that the standard deviation of the 
bolts was o = 0.04 mm. (As usual, it’s unreasonable to assume that 
you would know this in advance!) The result of the sample of 60 bolts 
is that = 8.117. This doesn’t look so bad; what should you do? 


We proceed by checking how significantly this number is away from 
the mean, as following. First, notice that the test statistic (a random 
variable!) 


_ X=H 
oo //60' 


where p represents the hypothesized mean, will be approximately nor- 
mally distributed with mean 0 and variance 1. The observed value of 
this test statistic is then 


Ty 


— a PAS 
a //60 


Whoa! Look at this number; it’s over three standard deviations 
away from the mean of Z and hence is way out in the right-hand tail of 
the normal distribution.?® The probability for us to have gotten such 
a large number under the correct assumption that Hp : js = 8.1 were 
true is very small (roughly .1%). This suggests strongly that we reject 
this null hypothesis! 


?8The probability P(|Z| > 3.29) of measuring a value this far from the mean is often called the 
P-value of the outcome of our measurement, and the smaller the P-value, the more significance we 
attribute to the result. In this particular case, P(|Z| > 3.29) * 0.001, which means that before 
taking our sample and measuring the sample mean, the probability that we would have gotten 
something this far from the true mean is roughly one-tenth of one percent! 
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Continuing the above example, assume more realistically that we 
didn’t know in advance the variance of the population of bolts, but 
that in the sample of 60 bolts we measure a sample standard deviation 
of s, = .043. In this case sample statistic 


ee 
S,,/V60 


has the t distribution with 59 degrees of freedom (hence is very approx- 
imately normal). The observed value of this sample statistic is 
t— pb 
= ~ 3.06. 
S,/V60 


As above, obtaining this result would be extremely unlikely if the hy- 
pothesis Hyp: p= 8.1 were true. 


Having treated the above two examples informally, we shall, in the 
subsequent sections give a slightly more formal treatment. As we did 
with confidence intervals, we divide the treatment into the cases of 
known and unknown variances, taking them up individually in the next 
two sections. 


EXERCISES 


1. In leaving for school on an overcast April morning you make a 
judgement on the null hypothesis: The weather will remain dry. 
The following choices itemize the results of making type I and type 
II errors. Exactly one is true; which one? 


Type I error: get drenched 
Type II error: needlessly carry around an umbrella 


Type I error: needlessly carry around an umbrella 
Type II error: get drenched 


Type I error: carry an umbrella, and it rains 
Type II error: carry no umbrella, but weather remains dry 


Type I error: get drenched 
Type II error: carry no umbrella, but weather remains dry 
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Type I error: get drenched 
Type II error: carry an umbrella, and it rains 


(E) 


2. Mr. Surowski’s grading policies have come under attack by the 


Central Administration as well as by the Board of Directors of 
SAS. To analyze the situation, a null hypothesis together with an 
alternative hypothesis have been formulated: 


Hy : Mr. Surowski’s grading policies are fair 
H,: Mr. Surowski plays favorites in awarding grades. 


The Board of Directors finds no irregularities, and therefore takes 
no actions against him, but the rumors among the students is that 
it is advantageous for Mr. Surowski’s students to regularly give 
him chocolate-covered expresso coffee beans. If the rumors are 
true, has an error been made? If so, which type of error? 


. An assembly-line machine produces precision bolts designed to 


have a mean diameter of 8.1 mm. Each morning the first 50 bear- 
ings produced are pulled and measured. If their mean diameter 
is under 7.8 mm or over 8.4 mm, the machinery is stopped and 
the foreman is called on to make adjustments before production is 
resumed. The quality control procedure may be viewed as a hy- 
pothesis test with the null hypothesis Hp: pw = 8.1. The engineer 
is asked to make adjustments when the null hypothesis is rejected. 
In test terminology, what would be the result of a Type II error 
(choose one)? 


A) A warranted halt in production to adjust the machinery 


( 
(B) An unnecessary stoppage of the production process 
(C) Continued production of wrong size bolts 

( 
( 


D) Continued production of proper size bolts 


) 
) 
) 
E) Continued production of bolts that randomly are the right or 
wrong size 
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6.5.1 Hypothesis testing of the mean; known variance 


Throughout this and the next section, the null hypothesis will have the 
form Ho: p= po. However, in the course of rejecting this hypothesis, 
we shall consider one- and two-sided alternative hypotheses. The 
one-sided alternatives have the form Hy: fu < wo or Hy: p> po. The 
two-sided alternatives have the form H,: fF [Mo. 


A one-sided alternative is appropriate in cases where the null hy- 
pothesis is Hp : jw = po but that anything < pug is acceptable (or 
that anything > jo is acceptable). This leads to two possible sets of 
hypotheses: 


fH[o: = po, Ha: b< Ho, 


or 
H[o: = po, Ha: b> Ho. 


EXAMPLE 1. Suppose that a manufacturer of a mosquito repellant 
claims that the product remains effective for (at least) six hours. In 
this case, anything greater than or equal to six hours is acceptable and 
so the appropriate hypotheses are 


Hoe f= Oy. dia <2 G, 


Therefore, a one-sided alternative is being used here. 


EXAMPLE 2. In the example of precision bolts discussed above, large 
deviations on either side of the mean are unacceptable. Therefore, a 
two-sided alternative is appropriate: 


Ao: w=po, Ha: wF bo, 


Next, one decides on a criterion by which Hp is to be rejected; that 
is to say, on decides on the probability a of making a Type I error. 
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(Remember, the smaller this error becomes, the larger the probability 
GB becomes of making a Type II error.) The most typical rejection level 
is a= 5%. As mentioned above, the test statistic becomes 


Xp 
ol /n- 


This is normally distributed with mean 0 and variance 1. The 5% 
rejection region is dependent upon the alternative hypothesis. It’s 
easiest just to draw these: 


Ho: #= Ho, Ha: WF bMo- 
0.5 


Normal Density Curve 
Rejection Region 


Area of Rejection 
Region = 0.05 


Normal Density Curve 


Area of Rejection 
Region = 0,05 
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0.5 
Normal Density Curve 
(4 
Area of Rejection 
Region = 0.05 a8 Rejection Region 
a = 0.05 | 
0.2 


-1.64 y 1.64 


6.5.2 Hypothesis testing of the mean; unknown variance 


In this setting, the formulation of the null and alternative hypotheses 
don’t change. What changes is the test statistic: 


Xp 
T= 
San/n 


This has the t-distribution with n — 1 degrees of freedom in either of 
the two cases itemized on page 386, namely either when we’re sampling 
from an approximately normal population or when the sample size is 
reasonably large. As in the previous section, the rejection regions at 
the a level of significance are determined on the basis of the alternative 
hypothesis. Furthermore, unless one implements the test automatically 
(as on a TI calculator), in finding the boundary of the rejection region 
one needs to consider the number of degrees of freedom of the t statistic. 


6.5.3. Hypothesis testing of a proportion 


If we encounter the claim that at least 55% percent of the American 
voting public prefer candidate A over candidate B, then a reasonable 
set of hypotheses to test is 
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Hos. PS x00 tg P59 Os 


The test statistic would then be 


— 


eee ea 
JPA — Pn 
which is n is large enough is approximately normally distributed. There- 
fore testing the hypothesis at the (1 — a)% level of significance can be 
handled in the usual fashion. 


6.5.4 Matched pairs 


One of the most frequent uses of statistics comes in evaluating the effect 
of one or more treatments on a set of subjects. For example, people 
often consider the effects of listening to Mozart while performing an 
intellectual task (such as a test). In the same vein, one may wish to 
compare the effects of two insect repellants. 

To effect such comparisons, there are two basic—but rather different— 
experimental designs which could be employed. The first would be to 
divide a group of subjects into two distinct groups and apply the dif- 
ferent “treatments” to the two groups. For example, we may divide 
the group of students taking a given test into groups A and B, where 
group A listens to Mozart while taking the test, whereas those in group 
B do not. Another approach to comparing treatments is to successively 
apply the treatments to the same members of a given group; this design 
is often called a matched-pairs design. In comparing the effects of 
listening of Mozart, we could take the same group of students and allow 
them to listen to Mozart while taking one test and then at another time 
have them take a similar test without listening to Mozart. 

In such situations, we would find ourselves comparing j11 versus [J2, 
where /11, 42 represent means associated with treatments 1 and 2. The 
sensible null hypothesis would be expressed as Hp: fy = pg and the 
alternative will be either one or two sided, depending on the situation. 

Without delving into the pros and cons of the above two designs, 
suffice it to say that the statistics used are different. The first design 
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represents taking two independent samples; we won’t go into the appro- 
priate statistic for evaluating Hp. On the other hand, the matched-pairs 
design is easier; all one needs to do is to compute the difference of the 
two effects. If X is the random variable representing the difference, and 
if wx is the mean of X, then we are simply evaluating Hp : px = 0 
against an appropriately-chosen alternative. The methods of the above 
sections apply immediately. 


EXERCISES 


1. The TI calculator command randNorm(0, 1) generates a random 
number from a normal distribution with mean 0 and variance 1. 
Do you believe this? The command 


randNorm(0, 1, 100) —> Ly 


will put 100 independent samples of this distribution into the list 
variable L,. Test the hypothesis 4 = 0 at the 99% significance 
level. 


2. 2° Sarah cycles to work and she believes that the mean time taken 
to complete her journey is 30 minutes. To test her belief, she 
records the times (in minutes) taken to complete her journey over 
a 10-day period as follows: 


30.1 32.3 33.6 29.8 28.9 30.6 31.1 30.2 32.1 29.4 


Test Sarah’s belief, at the 5% significance level. 


3. Suppose it is claimed that 80% of all SAS graduating seniors go 
on to attend American Universities. Set up null and alternative 
hypotheses for testing this claim. 


4. Suppose instead it is claimed that at least 80% of all SAS gradu- 
ating seniors go on to attend American Universities. Set up null 
and alternative hypotheses for testing this claim. 


?°From IB Mathematics HL Examination, May 2006, Paper 3 (Statistics and Probability), #3. 
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. Suppose that a coin is tossed 320 times, with the total number of 


“heads” being 140. At the 5% level, should the null hypothesis 
that the coin is fair be rejected? 


. A candidate claims that he has the support of at least 54% of the 


voting public. A random survey of 1000 voters reveals that among 
those sampled, this candidate only had the support of 51%. How 
would you report these results? 


. Ten healthy subjects had their diastolic blood pressures measured 


before and after a certain treatment. Evaluate the null hypothesis 
that there was no change against the alternative that the blood 
pressure was lowered as a result of the treatment. Use a 95% 
significance level. 


Systolic Blood Pressure 
Before Treatment | 83 | 89 | 86 | 91 | 84 | 91 | 88 | 90 | 86 | 90 | 
After Treatment | 77 | 83 | 85 | 92 | 85 | 86 | 91) 88 | 88 | 83 | 


. A growing number of employers are trying to hold down the costs 


that they pay for medical insurance for their employees. As part of 
this effort, many medical insurance companies are now requiring 
clients to use generic-brand medicines when filling prescriptions. 
An independent consumer advocacy group wanted to determine 
if there was a difference, in milligrams, in the amount of active 
ingredient between a certain “name” brand drug and its generic 
counterpart. Pharmacies may store drugs under different condi- 
tions. Therefore, the consumer group randomly selected ten differ- 
ent pharmacies in a large city and filled two prescriptions at each 
of these pharmacies, one of the “name” brand and the other for 
the generic brand of the same drug. The consumer group’s labora- 
tory then tested a randomly selected pill from each prescription to 
determine the amount of active ingredient in the pill. The results 
are given in the table below. 
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Active Ingredient 
(in milligrams) 
Pharmacy 1 2 3 4 5) 6 7 8 9 10 
Name brand |] 245 | 244 | 240 | 250 | 243 | 246 | 246 | 246 | 247 | 250 
Generic brand || 246 | 240 | 235 | 237 | 243 | 239 | 241 | 238 | 238 | 234 


Based on the above data, what should be the consumer group’s 
laboratory report about the difference in the active ingredient in 
the two brands of pills? Give appropriate statistical evidence to 
support your response. 


6.6 y? and Goodness of Fit 


Perhaps an oversimplification, the y? statistic gives us a means for 
measuring the discrepancy between how we feel something ought to 
behave versus how it actually appears to behave. In order to flesh 
out this very cryptic characterization, suppose we have a die which we 
believe to be fair, and roll it 200 times, with the outcomes as follows: 


Outcome (23/3) ae| S| 6) 
No. of occurrences | 33 | 40 | 39 | 28 | 36 24 | 


Does this appear to be the way a fair die should behave? Is there a 
statistic appropriate for us to measure whether its descrepancy from 
“fairness” is significant? 

Notice that the underlying null hypothesis would be that the die is 
fair, expressed in the form 


1 1 1 1 i! if 


H . = — = — = — = — = — = — 
0: Pl 6? P2 Gs P3 G P4 Gq Ps Ge Po 


(where the probabilities have the obvious definitions) versus the alter- 
native 


1 
H,: at least one of the proportions exceeds 6 


The appropriate test statistic, sometimes called the x? statistic, is 
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given by 


(ny — E(n))? 
E(ni) 


aS 


where the sum is over each of the k possible outcomes (in this case, k = 
6), where n; is the number of times we observe outcome i, and where 
E(n;) is the expected number of times we would observe the outcome 
under the null hypothesis. Thus, in the present case, we would have 
(ey = 200/62 Vi Dy dley Op ONG, ME 3a, Ne — AU, oon hg = 24s OF 
course, just because we have denoted this sum by x7? doesn’t already 
guarantee that it has a x? distribution. Checking that it really does 
would again take us into much deeper waters. However, if we consider 
the simplest of all cases, namely when there are only two categories, 
then we can argue that the distribution of the above statistic really is 
approximately x? (and with one degree of freedom). 

When there are only two categories, then of course we’re really doing 
a binomial experiment. Assume, then, that we make n measure- 
ments (or “trials”) and that the probability of observing a outcome 
falling into category 1 is p. This would imply that if n; is the num- 
ber of observations in category 1, then E(n,) = np. Likewise, the if 
nm is the number of observations in category 2, then ng = n — n, and 
E(ne) = n(1 — p). 

In this case, our sum takes on the appearance 


(ie (my — E(m))? | (m2— Elma)? _ (mrp)? | (m2 —n(1—p))? 
E(m) Bn) np n(1 — p) 

_ (n,—np)?  (n—n, —n(1—p))? 
np n(1—p) 

_ (m= np) | (m= np)’ 
np n(1—p) 

= (n1 — np)? 

np(1—p) 


However, we may regard n; as the observed value of a binomial random 
variable N; with mean np and standard deviation \/np (1 — p); further- 
more, if n is large enough, then N; is approximately normal. Therefore 
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= N, — np 
Vnp (1 — p) 


is approximately normally distributed with mean 0 and standard devi- 
ation 1. This means that 


ZF? = (Ny = np)? 

np (1—p) 

has approximately the y? distribution with one degree of freedom, com- 
pleting the argument in this very special case. 


EXAMPLE 1. Let’s flesh out the above in a very simple hypothesis- 
testing context. That is, suppose that someone hands you a coin and 
tells you that it is a fair coin. This leads you to test the hypotheses 


Hj:p=1/2 against the alternative H,:p#1/2, 


where p is the probability that the coins lands on heads on any given 
toss. 


To test this you might then toss the coin 100 times. Under the null 
hypothesis, we would have E(ny) = 50, where ny is the number of 
heads (the random variable in this situation) observed in 100 tosses. 
Assume that as a result of these 100 tosses, we get ny = 60, and so 
nr = 40, where, obviously, nr is the number of tails. We plug into the 
above x? statistic, obtaining 


» (60-50)? (40-50) 
x 50. 50 a 


So what is the P-value of this result? As usual, this is the probability 
P(x? > 4) which is the area under the x?-density curve for x > 4: 
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05 
Chi-Squared (1 OF) Density Curve 
0.4 
0.3 
Area = 0.0455 
0.2 \ 
0.1 \ 


(The above calculation can be done on your TI-83, using 1—x?cdf (0, 4, 1). 
The third argument, 1, is just the number of degrees of freedom.) 


Since the P-value is .0455, one sees that there is a fair amount of 
significance that can be attached to this outcome. We would—at the 
5% level of significance—reject Hp and say that the coin is not fair. 


EXAMPLE 2. Let’s return to the case of the allegedly fair die and the 
results of the 200 tosses. The x? test results in the value: 


— a(n — E(n))? 
x => E(n) 


(33 — 4°)? (40-48)? (39 — 8)? 


= 6 
200/6 ' 20/6 200/6 

| (28 —“P)” , (36 — =)? Cer ee 
200/6 200/6 200/6 —* 


The P-value corresponding to this measured value is P(y? > 5.98) ~ 
0.308. (We sometimes write y2 for the .? random variable with n de- 
grees of freedom.) This is really not small enough (not “significant 
enough) for us to reject the null hypothesis of the fairness of the die. 
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O27y 


x? (df) 


0.17 


P-value = area = 0.308 - il 


In general, experiments of the above type are called multinomial 
experiments , which generalize in a natural way the familiar bino- 
mial experiments. A multinomial experiment results in a number of 
occurrences in each of possibly many categories; the binomial experi- 
ment results in a number of occurrences in each of only two categories. 
The result of a multinomial experiment is typically summarized in a 
one-way table, the table on page 405 being a good example. The x? 
test used to test the null hypothesis regarding the individual category 
probabilities is often referred to as a test for homogeneity. 


The TI-83 calculators are quite adept at a variety of tests of hy- 
potheses; however, they don’t have a built-in code to test homogeneity 
hypotheses.” (They do, however, have a built-in code to test for in- 
dependence, which we’ll consider below.) At any rate, here’s a simple 
code which will test for homogeneity. Preparatory to running this pro- 
gram, one puts into list variable L; the observed counts and into list 
variable Ly the expected counts. For the problem of the putative fair 
die, the entries 33, 40, 39, 28, 36, 24 are placed in L, and the entry 
200/6 = 33.333 is placed into the first six entries of Lo. 


The following simple code will finish the job: 


30This was remedied on the TI-84s. 
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PROGRAM: CHISQ 
: Input “ DF”, N 
(0-8 
:For(I,1,N +1) 
:S + (Li(1) — Lo(2))?/(Lo(Z)) > $ 
: End 
:1—y%cdf(0,S,N) > P 
:Disp “CHISQ:”, S 
:Disp “P-VALUE”, P 


Running the above program (using the fact that there are 5 degrees 
of freedom) results in the output: 


CHISQ : 5.98 
P— VALUE: .308 


Example 3.*! Suppose that before a documentary was aired on pub- 
lic television, it had been determined that 7% of the viewing public 
favored legalization of marijuana, 18% favored decriminalization (but 
not legalization), 65% favored the existing laws, and 10% had no opin- 
ion. After the documentary was aired, a random sample of 500 viewers 
revealed the following opinions, summarized in the following one-way 
table: 


Distribution of Opinions About Marijuana Possession 
Legalization Decriminalization Existing Laws No Opinion 
39 99 336 26 


Running the above TI code yielded the following output: 


CHISQ: 13.24945005 
P-VALUE: .0041270649 


This tells us that there is a significant departure from the pre-existing 
proportions, suggesting that the documentary had a significant effect 


31This example comes from STATISTICS, Ninth edition, James T. McClave and Terry Sinich, 
Prentice Hall, 2003, page 710. (This is the text we use for our AP Statistics course.) 
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on the viewers! 


6.6.1 7? tests of independence; two-way tables 


Students who have attended my classes will probably have heard me 
make a number of rather cavalier—sometimes even reckless—statements. 
One that I’ve often made, despite having only anecdotal evidence, is 
that among students having been exposed to both algebra and geome- 
try, girls prefer algebra and boys prefer algebra. Now suppose that we 
go out and put this to a test, taking a survey of 300 students which 
results in the following two-way contingency table’: 


Gender 
Male Female Totals 
Prefers Algebra 69 86 155 
Prefers Geometry 78 67 145 
Totals Ay 153 300 


Subject Preference 


Inherent in the above table are two categorical random variables 
X=gender and Y= subject preference. We’re trying to assess the inde- 
pendence of the two variables, which would form our null hypothesis, 
versus the alternative that there is a gender dependency on the subject 
preference. 


In order to make the above more precise, assume, for the sake of 
argument that we knew the exact distributions of X and Y, say that 


P(X = male) =p, and P(Y prefers algebra) = q. 


If X and Y are really independent, then we have equations such as 


P(X = male and Y prefers algebra) = P(X = male) - P(Y prefers algebra) 
= Pd. 


Given this, we would expect that among the 300 students sampled, 
roughly 300pq would be males and prefer algebra. Given that the actual 


32These numbers are hypothetical—I just made them up! 
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number in this category was found to be 69, then the contribution to 
the x? statistic would be 


(69 — 300pq)? 
300pq 


Likewise, there would be three other contributions to the x? statistic, 
one for each “cell” in the above table. 


However, it’s unlikely that we know the parameters of either X or 


Y, so we use the data in the table to estimate these quantities. Clearly, 
the most reasonable estimate of p is p = an and the most reasonable 
estimate for g is g = a. This says that the estimated expected count of 
ye in the Male/Algebra category becomes E'(n1,) = 300 x ani * oon = 


nae This makes the corresponding to the x? statistic 


(ny — E(ny))? — (69 —- Soy? 


Bm) ER) 


The full y? statistic in this example is 


2 _ (mu E(m1))? | (maz — E(m12))? | (na — E(n21))? 
E(u) E(mp) E(nz1) 


(ng. — E(n22))? 
E(ng2) 


(69 _ ere e 
| 
(See) (ar) 


2.58. 


(86 _ los 155) | (78 _ MT-145 2 (67 — ies)? 
' (47488 ) (Gas) 
300 300 


2 


We mention finally, that the above y” has only 1 degree of freedom: this 
is the number of rows minus 1 times the number of columns minus 1. 
The P-value associated with the above result is P(y? > 2.58) = 0.108. 
Note this this result puts us in somewhat murky waters, it’s small 
(significant) but perhaps not small enough to reject the null hypothesis 
of independence. Maybe another survey is called for! 


In general, given a two-way contingency table, we wish to assess 
whether the random variables defined by the rows and the columns 
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are independent. If the table has r rows and c columns, then we shall 
denote the entries of the table by n;;, where 1 <i<rand1l<j<ce. 
The entries n;; are often referred to as the cell counts. The sum of 
all the cell counts is the total number n in the sample. We denote by 
Cl, Co, ...,C>- the column sums and by R;, Ro, ...,R, the row sums. 
Then in analogy with the above example, the contribution to the x? 
statistic from the (7, 7) table cell is (nj; RiCy )?/%&. as under the null 


hypothesis of independence of the random variables defined by the rows 


and the columns, the fraction se represents the expected cell count. 
The complete x? statistic is given by the sum of the above contributions: 


and has (r — 1)(c — 1) degrees of freedom. 


EXAMPLE 3. It is often contended that one’s physical health is depen- 
dent upon one’s material wealth, which we’ll simply equate with one’s 
salary. So suppose that a survey of 895 male adults resulted in the 
following contingency table: 


Salary (in thousands U.S.$) 
Health 15-29 30-39 40-59 > 60 Totals 
Fair 52 35 76 63 226 
Good 89 83 78 82 332 
Excellent 8&8 83 85 81 337 
Totals 229 201 239 226 895 


One computes x7 = 13.840. Since P(xZ > 13.840) = 0.031, one infers a 
significant deviation from what one would expect if the variables really 
were independent. Therefore, we reject the independence assumption. 
Of course, we still can’t say any more about the “nature” of the de- 
pendency of the salary variable and the health variable. More detailed 
analyses would require further samples and further studies! 


We mention finally that the above can be handled relatively easily 
by the TI calculator y? test. This test requires a single matrix input, A, 
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where, in this case, A would be the cell counts in the above contingency 
table. The Tl-calculator will automatically generate from the matrix 
A a secondary matrix B consisting of the expected counts. Invoking 
the x? test using the matrix 


A= 


results in the output 


\?-Test 
\? = 13.83966079 
P=.0314794347 
df=6. 


EXERCISES 


1. The TI command randInt(0,9) will randomly generate an integer 
(a “digit”) between 0 and 9. Having nothing better to do, we 


52 35 76 63 
89 83 78 82 
88 83 8) 8l 


invoke this command 200 times, resulting in the table: 
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digit — || 0 


1 


2 


3 


4 


5 


6 


7 


8 


9 


| frequency || 17 


21 


15 


19 


25 


Zt 


19 


23 


18 


Lt 


We suspect that the command randInt ought to generate random 
digits uniformly, leading to the null hypothesis 


Ao: 


where p; is the probability of generating digit 7, 7 = 0,1,2,...,9. 
Test this hypothesis against its negation at the 5% significance 


level. 


2. °3 Eggs at a farm are sold in boxes of six. Each egg is either brown 
or white. The owner believes that the number of brown eggs in a 


33 Adapted from IB Mathematics HL Examination, Nov 2003, Paper 2 (Statistics), #6 (iv). 


i 
Laas = Dla, 
Pe age 
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box can be modeled by a binomial distribution. He examines 100 
boxes an obtains the following data: 


Number of brown eggs in a box | Frequency 
0 10 
if 29 
2 ol 
3 18 
4 8 
5 3 
6 if 


(a) Estimate the percentage p of brown eggs in the population of 
all eggs. 


(b) How well does the binomial distribution with parameter p 
model the above data? Test at the 5% level. 


3. Suppose you take six coins and toss them simultaneously 100, lead- 
ing to the data below: 


Number of heads 
obtained 
0 0 


Frequency | Expected under Ho 


>} OT HR] CO] DO] 
w 
So 


Suppose that I tell you that of these six coins, five are fair and 
one has two heads. Test this as a null hypothesis at the 5% level. 
(Start by filling in the expected counts under the appropriate null 
hypothesis. ) 


4. Here’s a more extended exercise. In Exercise 18 on page 344 it 
was suggested that the histogram representing the number of trials 
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needed for each of 200 people to obtain all of five different prizes 

bears a resemblance with the Poisson distribution. Use the TI 

code given in part (c) to generate your own data, and then use a 

x? test to compare the goodness of a Poisson fit. (Note that the 
137 


mean waiting time for five prizes is w= 75.) 


. People often contend that divorce rates are, in some sense, related 


to one’s religious affiliation. Suppose that a survey resulted in 
the following data, exhibited in the following two-way contingency 
table: 


Religious Affiliation 


A B Cc None | Totals 
Marital History Divorced 21 32 15 32 100 
Never Divorced 78 90 34 90 292 
Totals 99 122 49 122 392 


Formulate an appropriate null hypothesis and test this at the 5% 
level. 


. (Here’s a cute one!)** The two-way contingency table below com- 


pares the level of education of a sample of Kansas pig farmers with 
the sizes of their farms, measured in number of pigs. Formulate 
and test an appropriate null hypothesis at the 5% level. 


Education Level 
No College College Totals 


<1,000 pigs 42 53 95 

; 1,000—2,000 pigs 27 42 69 
parE’. “> 01-8 O00: pigs 22 20 42 
>5,000 pigs 27 29 56 

Totals 118 144 262 


34 Adapted from STATISTICS, Ninth edition, James T. McClave and Terry Sinich, Prentice Hall, 
2003, page 726, problem #13.26. 


Probability 


p 
Table entry for p and C is 
the critical value t* with 
probability p lying to its right 
and probability C lying between 2 
—t* and t*. 
TABLE C ¢ distribution critical values 
Upper tail probability p 
df 25 20 15 10 05 025 02 01 005 0025 001 0005 
1 1.000 1.376 1.963 3.078 6.314 12.71 15.89 31.82 63.66 127.3 318.3 636.6 
2 0.816 1.061 1.386 1.886 2.920 4.303 4.849 6.965 9.925 14.09 22.33 31.60 
3 0.765 0.978 1.250 1.638 2.353 3.182 3.482 4.541 5.841 7.453 10.21 12.92 
4 0.741 0.941 1.190 1.533 2.132 2.776 2.999 3.747 4.604 5.598 7173 8.610 
5 0.727 ~=—0.920 1.156 1.476 2.015 2.571 2.757 3.365 4.032 4.773 5.893 6.869 
6 0.718 0.906 1.134 1440 1.943 2.447 2.612 3.143 3.707 4.317 5.208 5.959 
q 0.711 0.896 1.119 1.415 1.895 2.365 2.517 2.998 3.499 4.029 4.785 5.408 
8 0.706 0.889 1.108 1.397 1.860 2.306 2.449 2.896 3.355 3.833 4.501 5.041 
g) 0.703 = 0.883 1.100 = 1.383 1.833 2.262 2.398 2.821 3.250 3.690 4.297 4.781 
10 0.700 0.879 1.093 S72 1.812 2.228 2.359 2.764 3.169 3.581 4.144 4.587 
11 0.697 0.876 1.088 1.363 1.796 = 2.201 2.328 2.718 3.106 3.497 4.025 4.437 
12 0.695 = 0.873 1.083 1.356 1.782 2.179 2.303 2.681 3.055 3.428 3.930 4.318 
13 0.694 0.870 1.079 1.350 1.771 2.160 2.282 2.650 3.012 3.372 3.852 4.221 
14 0.692 0.868 1.076 1.345 1.761 2.145 2.264 2.624 2.977 3.326 3.787 4.140 
15 0.691 0.866 1.074 1.341 1.753 2.131 2.249 2.602 2.947 3.286 3.733 4.073 
16 0.690 0.865 1.071 L3i3)7/ 7A O22 02 235258329 al 3.252) 3.686 4.015 
17 0.689 0.863 1.069 = 1.333 1.740 2.110 2.224 2.567 2.898 222) 3.646 3.965 
18 0.688 0.862 1.067 1.330 =1.734 =. 2.101 2.214 2.552 2.878 3.197 3.611 B22 
19 0.688 0.861 1.066 = 1.328 1.729 2.093 2.205 2.539 2.861 3.174 3.579 3.883 
20 0.687 0.860 1.064 1.325 1.725 2.086 2.197 2.528 2.845 3.153 Ba 3.850 
21 0.686 0.859 1.063 1.323 1.721 2.080 2.189 2.518 2.831 3.135 3.527 3.819 
22 0.686 0.858 1.061 1.321 1.717 2.074 2.183 2.508 2.819 3.119 3.505 3.792 
23 0.685 0.858 1.060 1.319 1.714 2.069 2.177 2.500 2.807 3.104 3.485 3.768 
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abelian, 222 
absolute convergence, 277 
abstract algebra, 185 
addition 

of mass points, 47 
addition formulas, 39 
adjacency matrix, 109 
alternate number bases, 90 
alternating series test, 278 
alternative hypothesis, 399 
altitude, 14 
Angle Bisector Theorem, 15 
Apollonius Theorem, 27 
arithmetic mean, 147 
arithmetic sequence, 93 
Artin conjecture, 226 
associative, 47 

binary operation, 215 
axlomatic set theory, 187 


Benford’s Law, 358 
Bernoulli differential equation, 313 
Bernoulli random variable, 390 
bijective, 198 
binary operation, 210 
binary representation, 91 
binomial random variable, 329 
distribution, 329 
binomial theorem, 189 
bipartite graph, 136 
complete, 136 


brute-force method, 119 


Cantor Ternary Set, 280 
cardinality 

of a set, 188 
Carmichael number, 88 
Cartesian product, 186 

of sets, 195 
Catalan numbers, 345 
Cauchy-Schwarz inequality, 150 
Cayley table, 221 
cell counts, 413 
Central Limit Theorem, 377, 379 
central tendency, 365 
centroid, 13 
Ceva’s Theorem, 9 
Cevian, 9 
character, 240 
characteristic equation, 94 
characteristic polynomial, 94, 307 
cheapest-link algorithm, 122 
Chebyshev’s inequality, 323 
x? distribution, 356 
\? random variable, 356 
x? statistic, 405 
Chinese remainder theorem, 68, 70 
circle of Apollonius, 31 
circuit 

in a graph, 110 
circumcenter, 17 
circumradius, 17, 31, 34 
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closure, 212 cross ratio, 42 
commutative, 47 cycle 

binary operation, 215 in a graph, 110 
complement cyclic group, 224 

of a set, 191 cyclic quadrilateral, 35 
complete graph, 118, 135 Das vindeeode: 93 


concurrency, 8 De Morgan laws, 191 


degree 

of a vertex, 112 
DeMoivre’s theorem, 99 
density function, 349 


conditional convergence, 278 
conditional probability, 319 
confidence interval, 382 

for mean, 380, 385 

for proportion, 389 


derivative 
confidence level, 380 of a function, 248 
connected graph, 110 difterence 
containment, 185 of sets, 191 
continuous function, 248 of subsets, 186 
continuous random variable, 317, difference equation 
348 Fibonacci, 106 
mean, 365 homogeneous, 94 
median, 365 second order, 96 
mode, 365 differentiable 
standard deviation , 366 function, 249 
variance, 365 differential equation, 304 
convergence Bernoulli, 313 
absolute, 277 linear, 304 
conditional, 278 separable, 308 
Dirichlet test, 281 Dijkstra’s algorithm, 132 
of a sequence, 266 Dirichlet’s test for convergence, 281 
convex combination, 155 discrete random variable, 317 
convolution, 261, 370 discriminant, 161, 174 
cosets, 235 distribution, 318 
cosine distributions 
addition formula, 39 binomial, 329 
law of, 24 exponential, 358 
coupon problem, 331 geometric, 327 


criminals, 75 hypergeometric, 334 
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negative binomial, 330 
distributive laws, 192 
divides, 57 
division algorithm, 56 
dual graph, 143 
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formal definition, 268 
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elementary symmetric polynomials, 
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of a set, 185 
equivalence class, 202 
equivalence relation, 201 
equivalence relations, 186 
Euclid’s Theorem, 3 
Euclidean algorithm, 59 
Euclidean trick, 58 
Euler ¢-function, 63 
Euler characteristic, 139 
Euler line, 22 
Euler’s constant, 269 
Euler’s constant y, 269 
Euler’s degree theorem, 112 
Euler’s formula, 140 
Euler’s method, 314 
Euler’s theorem, 87, 112 
Euler’s totient function, 63 
Euler-Mascheroni constant, 269 
Eulerian circuit, 111 
Eulerian trail, 111 
expectation, 318 
explicit law of sines, 34 
exponential distribution 

mean, 360 

variance, 360 
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failure rate, 360 
Fermat conjecture, 55 
Fermat number, 78 
Fermat’s Little Theorem, 86 
Fibonacci difference equation, 106 
Fibonacci sequence, 93, 106, 276 
generalized, 106 
fibre 
of a mapping, 198 
fundamental theorem of arithmetic, 
76 
fundamental theorem of calculus, 
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gambler’s ruin, 343 
Gamma function, 261 
general linear group, 219 
generalized Fibonacci sequence, 106 
generalized Riemann hypothesis, 226 
generating function, 109 
geometric 
sequence, 93 
geometric distribution 
generalizations, 330 
geometric mean, 147 
geometric random variable, 327 
distribution, 327 
mean, 328 
variance, 329 
geometric sequence, 93 
Gergonne point, 18 
golden ratio, 27, 41, 277 
golden triangle, 27 
graph, 109 
bipartite, 136 


INDEX 


complete, 118, 135 
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homeomorphism, 137 

minor, 138 

planar, 136 

simple, 109, 135 

weighted, 109 
graph automorphism, 208 
graphs 

isomorphic, 134 
greatest common divisor, 57 
greatest lower bound, 250 
greedy algorithm, 128 
group, 217 

abelian, 222 

cyclic, 224 
group theory, 185 


Holder’s inequality, 158 
Hamiltonian cycle, 117 
harmonic mean, 42, 148 
harmonic ratio, 41 
harmonic sequence, 109, 148 
harmonic series, 265, 348 
random, 326 
Heron’s formula, 25 
higher-order differences 
constant, 102 
histogram, 354 
homeomorphic 
graphs, 137 


homeomorphism of graphs, 137 


homogeneous 
differential equation, 310 
function, 310 


homogeneous difference equation, 
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homomorphism 
of groups, 236 
hypergeometric random variable, 334 
distribution, 334 
mean, 335 
variance, 336 
hypothesis, 395 
alternative, 399 


identity, 215 
improper integrals, 254 
incenter, 14 
incircle, 17 
independent, 348 
indeterminate form, 257 
inductive hypothesis, 81 
inequality 
Cauchy-Schwarz, 150 
Holder’s, 158 
unconditional, 145 
Young’s, 157 
infinite order, 226 
infinite series, 264 
initial value problem, 305 
injective, 198 
inscribed angle theorem, 28 
integrating factor, 312 
internal division, 41 
intersecting chords theorem, 33 
intersection, 186 
of sets, 190 
irrationality of 7, 253 
isomorphic graphs, 134 
isomorphism 
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Konigsberg, 111 
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Kruskal’s algorithm, 128 
Kuratowski’s theorem, 137 


VHopital’s rule, 259 
Lagrange form of the error, 301 
Lagrange’s theorem, 233 
Laplace transform, 257 
law of cosines, 24 
law of sines, 23 
explicit, 34 
least common multiple, 59 
least upper bound, 250 
level of confidence, 380 
limit 
of a function, 245 
of a sequence, 249 
one-sided, 246 
limit comparison test, 269 
linear congruences, 89 
linear difference equation, 93 
general homogeneous, 94 
linear Diophantine equation, 65 
linear recurrence relations, 93 
lines 
concurrent, 8 
logistic differential equation, 305 
logistic map, 93 
logistic recurrence equation, 93 
loop 
of a graph, 110 
low-pass filter, 262 
lower Riemann sum, 250 
Lucas numbers, 106 


Maclaurin polynomial, 291 
Maclaurin series, 291 
mappings, 186 
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margin of error, 392 
Markov’s inequality, 323 
mass point, 47 
mass point addition, 47 
mass point geometry, 46 
mass splitting, 51 
matched-pairs design, 402 
maximum-likelihood estimate, 376 
Maxwell-Boltzmann density func- 
tion, 357 
Maxwell-Boltzmann distribution, 357 
mean, 318 
arithmetic, 147 
confidence interval, 380 
geometric, 147 
harmonic, 148 
quadratic, 148 
mean value theorem, 298 
medial triangle, 19 
medians, 13 
Menelaus’ Theorem, 11 
Mersenne number, 235 
Mersenne prime, 92 
Midpoint Theorem, 6 
minimal-weight spanning tree, 125 
minor of a graph, 138 
multinomial distribution, 341 
multinomial experiment, 409 


nearest-neighbor algorithm, 121 
negative binomial, 330 
nine-point circle, 43 
normal distribution, 350 
null hypothesis, 395 
number bases 
alternate, 90 
number theory, 55 
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one-to-one, 198 
one-way table, 409 
onto, 198 
opens, 29 
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order 

infinite, 226 

of a set, 188 

of an element, 226 
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orthogonal intersection, 43 


p-series test, 272 
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parameters, 321, 350 
partition 
of an interval, 249 
Pascal’s theorem, 21 
path 
in a graph, 110 
permutation, 198 
Petersen graph, 138 
planar graph, 136 
Poisson random variable, 337 
distribution, 337, 339 
variance, 339 
polynomials 
elementary symmetric, 176 
power of a point, 33 
power series, 283 
radius of convergence, 284 
power set, 186, 189 
Prim’s algorithm, 130 
prime, 60, 75 
relatively, 60 
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conditional, 319 
projective plane, 205 
proper containment, 187 
proportional segments 
Euclid’s Theorem, 3 
Ptolemy’s theorem, 37 
Pythagorean identity, 23 
Pythagorean theorem, 3 
Garfield’s proof, 4 
Pythagorean triple, 67 
primitive, 67 


quadratic mean, 148 
quotient set, 203 


radius of convergence, 284 
Ramsey number, 120 
Ramsey theory, 120 
rand, 348 
density function, 349 
random harmonic series, 326 
random variable, 317 
Bernoulli, 329, 390 
binomial, 327, 329 
continuous, 317, 348 
mean, 365 
median, 365 
mode, 365 
standard deviation, 366 
variance, 365 
discrete, 317 
expectation, 318 
mean, 318 
standard deviation, 321 
variance, 321 
exponential, 358 
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geometric, 327 
hypergeometric, 327, 334 
negative binomial, 327 
normal, 351 

Poisson, 327, 337 
standard deviation, 321 


uniformly distributed, 348 


variance, 321 
random variables 
discrete 
independent, 321 
independent, 348 
negative binomial, 330 
ratio test, 274 
real projective plane, 205 
recurrence relations 
linear, 93 
reflexive 
relation, 201 
rejection region, 400 
relation, 200 
relations 
on sets, 186 
relatively prime, 60 
reliability, 361 
Riemann integral, 249, 250 
root mean square, 148 
Routh’s theorem, 54 
routing problems, 111 
Russell’s antinomy, 187 
Russell’s paradox, 187 


sample mean, 374 
expectation, 374 
unbiased estimate, 374 


sample standard deviation, 375 


sample variance, 375, 386 


unbiased estimate, 375 
secant-tangent theorem, 32 
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external division, 41 

internal division, 41 
sensed magnitudes, 7 
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separable differential equation, 308 


sequence, 249 
arithmetic, 93 
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significant, 395 

similar triangles, 4 

simple graph, 109, 135 

Simson’s line, 36 
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simultaneous congruences, 70 


sine 
addition formula, 39 
law of, 23 
sinusoidal p-series test, 276 
slope field, 305 
spanning tree, 125 
minimal-weight, 125 
St. Petersburg paradox, 326 
stabilizer, 230 
standard deviation, 321 
statistics, 373 
Steiner’s Theorem, 32 
Stewart’s Theorem, 26 
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symmetric 
relation, 201 
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symmetric difference, 186, 211 Vandermonde matrix, 174 
symmetric group, 218 variance, 321 
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t distribution, 386 

t statistic, 386 

Taylor series, 291 

Taylor’s theorem with remainder, 


vertex, 109 
of a graph, 109 
vertex-transitive graph, 241 
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test for homogeneity, 409 walk 
test statistic, 396 in a graph, 110 
torus, 140, 196 Wallace’s line, 36 
trail Weak Law of Large Numbers, 324 
in a graph, 110 Weibull distribution, 365 
transitive weighted directed graph, 131 
relation, 201 weighted graph, 109 


transversal, 11, 51 
traveling salesman problem, 118 
treatment, 402 Zorn’s Lemma, 125 
tree, 125 
triangle 
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circumradius, 17, 31, 34 
orthocenter, 14 
triangle inequality, 248 
two way contingency table, 411 
type I error, 395 
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unbiased estimate, 374, 386 
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uniformly distributed, 348 
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upper Riemann sum, 249 
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